Research and Practice Methods
What Can Digital Disease Detection Learn from (an External Revision to) Google Flu Trends?

https://doi.org/10.1016/j.amepre.2014.05.020Get rights and content

Background

Google Flu Trends (GFT) claimed to generate real-time, valid predictions of population influenza-like illness (ILI) using search queries, heralding acclaim and replication across public health. However, recent studies have questioned the validity of GFT.

Purpose

To propose an alternative methodology that better realizes the potential of GFT, with collateral value for digital disease detection broadly.

Methods

Our alternative method automatically selects specific queries to monitor and autonomously updates the model each week as new information about CDC-reported ILI becomes available, as developed in 2013. Root mean squared errors (RMSEs) and Pearson correlations comparing predicted ILI (proportion of patient visits indicative of ILI) with subsequently observed ILI were used to judge model performance.

Results

During the height of the H1N1 pandemic (August 2 to December 22, 2009) and the 2012–2013 season (September 30, 2012, to April 12, 2013), GFT’s predictions had RMSEs of 0.023 and 0.022 (i.e., hypothetically, if GFT predicted 0.061 ILI one week, it is expected to err by 0.023) and correlations of r=0.916 and 0.927. Our alternative method had RMSEs of 0.006 and 0.009, and correlations of r=0.961 and 0.919 for the same periods. Critically, during these important periods, the alternative method yielded more accurate ILI predictions every week, and was typically more accurate during other influenza seasons.

Conclusions

GFT may be inaccurate, but improved methodologic underpinnings can yield accurate predictions. Applying similar methods elsewhere can improve digital disease detection, with broader transparency, improved accuracy, and real-world public health impacts.

Introduction

The rapid escalation of digital methods is changing public health surveillance.1, 2, 3 By harvesting web data, investigators claim to validly estimate cholera,4 dengue,5, 6 influenza,7, 8 kidney stones,9, 10 listerosis,11 methicillin-resistant Staphylococcus aureus,12 mental health,13 and tobacco control14 trends, but are they actually valid?

The novelty of digital data has generally remained the central focus in these studies, whereas the methods and disinterested interpretations have been overlooked. Therefore, studies demonstrating modest associations with ground truth outcomes (e.g., R2=0.15,14 R2=0.25,4 or R2=0.6211) have been presented as accurate, without further model validation. Most notable is Google Flu Trends (GFT),8 not because it is potentially the most flawed but because it is oft-cited and many subsequent studies modeled their approach after GFT or even used weaker methods.6, 12, 15, 16

Concerns about GFT’s accuracy came to light via media reports in 2009 when it misrepresented the epidemic curve and required updating that Autumn.17 Again during 2012–2013, media reports questioned the revised GFT,18 followed by separate peer-reviewed analyses suggesting GFT was typically inferior to traditional sentinels owing to inaccuracies.19, 20 Most recently, Google again updated their model to improve GFT operation but did not identify their revisions or describe its performance.21 Many, unfortunately, are unaware of these problems.

The head of the CDC Influenza Surveillance and Outbreak Response Team told Nature News that she monitors GFT (and other digital disease detection sentinels) “all the time,” likely in the sense that some data are better than no data.18 Moreover, some investigators are beginning to use GFT as ground truth for epidemiologic studies.22 However, if GFT (and by extension similar systems for other outcomes) are invalid, should public health officials be paying any attention?

We remain optimistic about the future of GFT and digital disease detection broadly23, 24, 25, 26 because a methodologic problem has a methodologic solution. Herein, a transparent, external evaluation of GFT, as a case study for the scientific status of digital disease detection, is presented. An alternative methodology capable of outperforming GFT is subsequently proposed, with potential application across digital disease detection.

Section snippets

Methods

The methodology behind the original GFT and the 2009 revision (published in 2011) consisted of building a regression for CDC-reported influenza-like illnesses (ILI) with a single explanatory variable. Originally, the single variable was the mean trend for the 45 search terms with the strongest correlation with ILI for September 28, 2003, through March 11, 2007.8 The revised GFT single variable was the mean trend for the most correlated search terms (approximately 160, the exact number unknown)

Results

Figure 1 presents GFT’s and our alternative model’s predictions alongside the subsequently observed ILI trends, where it is readily apparent that the alternative produced more accurate predictions.

During Wave 1 (March 29 through August 2, 2009) and Wave 2 (August 3 through December 27, 2009) of the H1N1 outbreak, particularly important periods of ILI surveillance, the RMSEs were 0.008 and 0.023 (i.e., if GFT predicted 0.061 ILI, it would be have a usual error of 0.008 or 0.023 each week) with

Discussion

Our alternative methodology is capable of producing more accurate predictions of influenza activity than GFT, and does so autonomously with dynamic updating of the model each week. With 3–5 million infected and 250,000–500,000 killed by influenza worldwide each year,33 influenza surveillance is of tremendous importance, providing necessary intelligence for hospitals facing staffing decisions, physicians facing active and accurate diagnoses, employers with workers at risk for infection, and

Acknowledgments

This work was improved by comments from presentations at Harvard Medical School, the New York City Department of Heath and Mental Hygiene, and Stanford Medical School, with special appreciation for John S. Brownstein, Mark Dredze, John Ioannidis, Donald Olson, Keith Schnakenber, Diana Z. Li, Zhenbu Zhang, and the eight anonymous American Journal of Preventive Medicine reviewers. The authors agree to make their data and code available to other investigators wishing to replicate this study.

JWA

References (46)

  • J.W. Ayers et al.

    Could behavioral medicine lead the web data revolution?

    JAMA

    (2014)
  • R. Chunara et al.

    Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak

    Am J Trop Med Hyg

    (2012)
  • B.M. Althouse et al.

    Prediction of dengue incidence using search query surveillance

    PLoS Negl Trop Dis

    (2011)
  • E.H. Chan et al.

    Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance

    PLoS Negl Trop Dis

    (2011)
  • G. Eysenbach

    Infodemiology: tracking flu-related searches on the web for syndromic surveillance

    AMIA Annu Symp Proc

    (2006)
  • J. Ginsberg et al.

    Detecting influenza epidemics using search engine query data

    Nature

    (2009)
  • S.D. Willard et al.

    Internet search trends analysis tools can provide real-time data on kidney stone disease in the U.S

    Urology

    (2011)
  • K. Wilson et al.

    Early detection of disease outbreaks using the Internet

    CMAJ

    (2009)
  • V.M. Dukic et al.

    Internet queries and methicillin-resistant Staphylococcus aureus surveillance

    Emerg Infect Dis

    (2011)
  • P.A. Cavazos-Rehg et al.

    Monitoring of non-cigarette tobacco use using Google Trends

    Tob Control

    (2014)
  • Q. Yuan et al.

    Monitoring influenza epidemics in China with search query from Baidu

    PLoS One

    (2013)
  • A.J. Ocampo et al.

    Using search queries for malaria surveillance, Thailand

    Malar J

    (2013)
  • S. Cook et al.

    Assessing Google flu trends performance in the U.S. during the 2009 influenza virus A (H1N1) pandemic

    PLoS One

    (2011)
  • Cited by (141)

    • Social physics

      2022, Physics Reports
      Citation Excerpt :

      These pioneering works thus opened the door to using search queries to detect near real-time influenza epidemics. The original algorithm in Refs. [530,905] has later been shown to suffer from several major limitations that lead to inaccurate estimates [906,907]. The algorithm is static and fails to account for time-series properties such as the seasonality of influenza activity, whereas aggregating multiple query terms into a single variable ignores changes in internet-search behaviour over time.

    View all citing articles on Scopus
    View full text