Detecting Devastating Diseases in Search Logs

MSR-TR-2016-5 |

Web search queries can offer a unique population-scale window onto streams of evidence that are useful for detecting the emergence of health conditions. We explore the promise of harnessing behavioral signals in search logs to provide insights and methodologies that could lead to advance warning about the presence of devastating diseases such as pancreatic adenocarcinoma. Pancreatic adenocarcinoma is often diagnosed too late to be treated effectively as the cancer has usually metastasized by the time of diagnosis. There are few symptoms in the early stages of the illness; specific constellations of symptoms that raise concerns about pancreatic cancer typically appear only after the disease is already at an advanced stage. We identify experiential searchers who issue credible, first-person diagnostic queries for pancreatic cancer and we learn models from prior search histories that predict the later appearance of experiential queries. We show that we can infer the likelihood of seeing the rise of experiential queries months before they appear and characterize the tradeoff between positive predictivity and false positive rate. Our findings have implications for the early detection of pancreatic cancer and more generally for harnessing search systems to reduce health risks for individuals.