Screening for Pancreatic Adenocarcinoma using Signals from Web Search Logs: Feasibility Study and Results
- John Paparrizos ,
- Ryen W. White ,
- Eric Horvitz
Journal of Oncology Practice | , pp. 737-744
Introduction: People’s online activities can yield clues about their emerging health conditions. We perform an intensive study to explore the feasibility of using anonymized web query logs to screen for the emergence of pancreatic adenocarcinoma. The methods use statistical analyses of large-scale anonymized search logs considering the symptom queries from millions of people, with potential application in warning individual searchers about the value of seeking attention from healthcare professionals.
Methods: We identify searchers in logs of online search activity who issue special queries that are suggestive of a recent diagnosis of pancreatic adenocarcinoma. We jump back many months prior to these landmark queries to examine patterns of symptomatology expressed as searches about concerning symptoms. We build statistical classifiers that predict the future appearance of the landmark queries based on patterns of signals seen in search logs.
Results: We find that signals about patterns of queries in search logs can predict the future appearance of queries that are highly suggestive of a diagnosis of pancreatic adenocarcinoma. We show specifically that we can identify 5–15% of cases while preserving extremely low false positive rates (0.00001–0.0001).
Conclusion: Signals in search logs show the possibilities of predicting a forthcoming diagnosis of pancreatic adenocarcinoma from combinations of subtle temporal signals revealed in the queries of searchers over time.