Can Machine Learning Give Investigative Journalism the Scoop?

While attending the recent Nicar 2013 conference in Lousiville, Kentucky, Andrew Trench, Media24 Investigations editor and technology blogger reported on a fascinating demonstration of machine learning for finding news stories and insights humans would typically overlook.

Machine Learning at a Glance
Courtwesy GrubStreet.co.za & Andrew Trench

He then shares his vision for how machine learning will impact news as we know it in terms of gathering and shaping stories as well as the news business itself.

ProPublica’s Jeff Larson was the presenter at Nicar. By way of the Message Machine project, his non-profit investigative reporting group used machine learning to uncover a number of major stories.

The project clustered documents and applied decision trees to comb through vast volumes of crowd-sourced emails from their readers on a given topic.

In this case, the topic was how US political parties raised money by tailoring their pitches to suit the demographics of the email recipients.

Under the hood, algorithms convert every word in every email to a number. Documents then have mathematical properties that allow them to be clustered as similar or different.

Apart from the tedium such clustering and conversion tasks would impose on the human mind, scouring the sheer volume of content collected would be too time-consuming and expensive. All the more so given the ever shorter news cycles we’ve come to accept nowadays.

Trench envisions using machine learning to more accurately predict which stories will yield the most likes, click-thrus, and commentary.

He then ruminates about what it would be like for editors concerned about daily sales to have hard data to go on instead of gut instinct.

Upon reading this, I recalled the timeless bible for direct response marketers known as Scientific Advertising by Claude Hopkins.

Written in the 1920s, Hopkins takes advertisers to task for going with their guts when they could craft ads and calls to action that would give them the data they need to continually improve their response from readers.

In effect, Google Adwords is Scientific Advertising on steroids because it forces all businesses to be better direct marketers in real time.

Meanwhile, chances are, machine learning is already quietly building out Trench’s vision of newspapers organized using prediction engines.

After all, if sites and apps like Flipboard allow readers to pull in their own personalized magazines, I suppose the big challenge for traditional media online is to push out an engaging product that differentiates itself by expanding the reader’s horizons in ways they would not on their own.

Which brings us back to Larson and machine learning as a way to make investigative reporting economically viable again.

Starting in the 1980s, the media business converted news from a loss leader to a profit center, amid a flurry of mergers and acquisitions. Along the way, investigative reporting gave way to infotainment because it was seen as an anathema to making profits.

Today, many complain that the mainstream media in the US offers too much commentary and too little “hard news.” In turn, news networks from overseas are gaining American viewers by filling this void.

And so perhaps, we’ve come full circle. With the help of machine learning, traditional media can strike the right balance between catering to their audience’s known preferences and wowing them with authentic, hard-hitting stories as a counterweight to ubiquitous fluff.

How do you see machine learning transforming the way you communicate and publish?