Information Retrieval, Algorithms, and Data Mining

By Evgeniy Gabrilovich, Ravi Kumar and Belle Tseng


Search technologies are an international team of experts in search, algorithms, data processing and data mining, information retrieval and natural language processing. Together, we build systems and algorithms to analyze user needs, then synthesize and deliver the right responses from data sources around the globe.

Challenges

Modeling

How to understand and model user information needs such as long-running preferences, longitudinal tasks, session-level goals, etc? How can we learn what a user is actually searching for from the query and click logs? How do we learn components of a mixture of intents, having aggregated information about many users and their patterns of interactions with search results? How to abstract the user activity on a search results page and use this information to synthesize the search page itself better? How to provide a personalized search experience by leveraging each individual's search history, while taking advantage of many users' past patterns of interactions?

Metrics

How to gauge intent satisfaction and design a general framework for measuring whole-page relevance? With the presentation aspects of search getting more intricate, how do we treat the search results presentation as a global optimization problem? How does diversity play a role? For these questions, it becomes important to develop a more holistic notion of user satisfaction. How to measure user satisfaction from non-Web search results (quicklinks, results from verticals, and other shortcuts)?

Mining

The focus is Web information mining, including the analysis of click/query/toolbar logs and Web page content. Large-scale distributed computing infrastructure has helped us analyze data of magnitudes unimaginable a few years ago. Mining this vast amount of data is important to identify latent patterns, track trend changes, and analyze data at various scales seamlessly. How to do this efficiently and what algorithmic tools are needed? How to automatically extract structured information from Web documents? What kinds of personal information can be stored and aggregated without violating user privacy?

Multilingual IR

With an increasing number of non-English web users, information retrieval in languages other than English becomes important in the near future. This involves effective and efficient adaptation of existing techniques, such as statistical machine translation, developed for English to other languages. How to do this adaptation?

Using world knowledge

Can we use common-sense and background knowledge to go beyond information retrieval at the level of mere words? Imagine using all of the knowledge available in Freebase to sift through information more effectively.