The problem of big data hubris.
Is Google's Flu Trends (GFT) tracking system a failure of big data?
A new article published this afternoon in Science magazine suggests that Google's much-covered algorithmic Flu Trends model, used to monitor search queries to track the spread of the flu, has routinely failed to accurately predict flue prevalence since it's inception in 2008. While the report is another setback for Google (a 2013 article in the science journal Nature came to a similar conclusion about GFT's accuracy), GFT's failures represent a bigger struggle for data-driven research and threaten to cast a shadow on the broader, much-hyped concept of big data.
In late 2008, Google announced Google Flu Trends to a series of cautiously optimistic early reviews. The New York Times described it as “what appears to be a fruitful marriage of mob behavior and medicine” and many held out hope that Google's algorithm could outperform CDC data models, which have long been held as the standard for flu detection and prediction. Here's a section from Google's blog post announcing Flu Trends in 2008:
It turns out that traditional flu surveillance systems take 1-2 weeks to collect and release surveillance data, but Google search queries can be automatically counted very quickly. By making our flu estimates available each day, Google Flu Trends may provide an early-warning system for outbreaks of influenza.
In reality, Google has continually failed to beat the CDC's model, even with its two-week lag in reporting. According to the report, “GFT also missed by a very large margin in the 2011–2012 flu season and has missed high for 100 out of 108 weeks starting with August 2011.” Northeastern University's David Lazer, one of the report's authors, told BuzzFeed that Google's “been missing high the large majority of the time. That's just a fact. When [GFT] was introduced, it had some really bold claims about how close it could get, but we can see there's a big difference in missing CDC predictions by three percentage points and 150 percentage points like it did last year.”
Lazer and his team's findings. “GFT overestimated the prevalence of flu in the 2012–2013 season and overshot the actual level in 2011–2012 by more than 50%.”
Science Magazine / Via David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani/ sciencemag.org
For Lazer and his research team, GFT's problems are symptomatic of the bigger problem of “big data hubris,” where readers and researchers alike see big data as a cure-all and an immediate line to the universal truth. Indeed, the most insidious part of the big data movement is the potential for people to believe anything just as long as a Google algorithm or Twitter sampling is attached. “Consider something like Twitter. How much of what's on Twitter is really humans versus bots? When we think of bots, how active are they?” Lazer asks.
“It's easy to fall into a trap with this stuff,” Lazer said. “I think of the classic cartoon — nobody on the internet knows you're a dog. If you're trying to measure human opinion and behavior, that's a problem you run into.” He notes that, in GFT's case, the data lacks the context to be universally helpful.
“We don't necessarily know what's being reflected when people search for things. We're still making guesses that these could be people or they could be bots,” he said. “Even when you search for 'the flu,' you could be searching for that because you have a scholarly interest and you're writing a paper or you could be searching to fix your chimney's flue and you're spelling it wrong. There's no concrete way to differentiate.”
The Science report cites multiple reasons for Google's disappointing results, including changes to Google's search results, which have evolved to produce suggested terms beyond what the user searched for. In February 2012, for example, the search engine began returning potential diagnoses for searches that included symptoms such as “cough,” “fever,” and “runny nose.”
Lazer and his team believe these suggested diagnoses may have contributed to an uptick in “flu” searches. Similarly, the report cites the possibility of outside data manipulation by researchers and argues that, “ironically, the more successful we become at monitoring the behavior of people using these open sources of information, the more tempting it will be to manipulate those signals.”