Understanding Intrinsic Diversity in Web Search: Improving Whole-Session Relevance
- Karthik Raman ,
- Paul N. Bennett ,
- Kevyn Collins-Thompson
ACM Transactions on Information Systems (TOIS) | , Vol 32
Current research on Web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [Jones and Klinkner 2008] which span multiple queries across one or more search sessions [Liu and Belkin 2010; Kotov et al. 2011]. An ideal search engine would not only retrieve relevant results for a user’s particular query but also be able to identify when the user is engaged in a more complex task and aid the user in completing that task [Morris et al. 2008; Agichtein et al. 2012]. Toward optimizing whole-session or task relevance, we characterize and address the problem of intrinsic diversity (ID) in retrieval [Radlinski et al. 2009], a type of complex task that requires multiple interactions with current search engines. Unlike existing work on extrinsic diversity [Carbonell and Goldstein 1998; Zhai et al. 2003; Chen and Karger 2006] that deals with ambiguity in intent across multiple users, ID queries often have little ambiguity in intent but seek content covering a variety of aspects on a shared theme. In such scenarios, the underlying needs are typically exploratory, comparative, or breadth-oriented in nature. We identify and address three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.