Search Sciences: Scientific Fields
Yahoo! Labs Search Sciences brings together a number of scientific disciplines to tackle the challenges of running a web-scale search engine. The dominant ones are listed below but search has become a multidisciplinary endeavor and there are additional areas of science employed by Search Sciences, including cognitive science, computational linguistics, user experience design and others.
Information Retrieval
For Yahoo! Labs Search Sciences, the field of information retrieval (IR) is of paramount importance in virtually all of their projects.
Which web sites to crawl, how to implement efficient indexing, what properties of documents and sites are important for relevance ranking, what are safe and effective ways to reformulate queries, how to select sentences for a query-biased summary - these are all problems under continuous attack at Yahoo! Labs, and information retrieval is a valuable tool in addressing them. Search Sciences helps re-define the IR field by constantly pushing the boundaries on performance, precision and scalability.
Machine Learning
Yahoo! Labs Search Sciences believes very strongly in the power of machine learning and uses the science and methodology of this field in almost every aspect of the search engine. Search Sciences uses machine learning to train their ranking functions and to learn effective models for query transformation. Staistical models are also used for Federation, Crawling, Content Annotation and Performance Evaluation. In fact, machine learning is not only central but actually essential to the operation of a modern search engine.
Data Mining
To maintain optimal performance from Yahoo!'s search engine, the Search Sciences team constantly uses Data Mining. They use it to learn from Yahoo!'s users via click-modeling, to learn from the crawled content via graph analysis, and to understand the relationships between pieces of information and people via collaborative filtering or analysis of social networks. Operating a search engine creates tremendous opportunities for rapid experimentation, and never before in the history of computing could so many experiments be conducted with so many data points over such a short period of time.
Text Mining
To complement their Data Mining activities, Yahoo! Labs Search Sciences does tremendous amounts of scientific work in text mining. The content system of a search engine, with its tens of billions of documents is a treasure trove of text that can be analyzed and utilized to extract tremendous amounts of information. Search Sciences uses text mining to acquire lexical knowledge, to understand the properties of entities being referenced in the web, and to draw conclusions about the relationships between references - for example, to people, places and products. Search Sciences uses text mining techniques to learn models of query intent and to derive aggregate properties of documents. Text mining is an essential compenent to running a scalable search engine operation.
Natural Language Processing
Some would classify text mining as a subset of Natural Language Processing (NLP), but the Yahoo! Labs Search Sciences team finds it useful to draw a distinction between the two fields. Techniques from NLP are very effective in many of the challenges Search Sciences faces - parsing queries, extracting information from web sites, and understanding documents and the terms that carry their meaning. NLP is also used in summarization, translation models, and clustering tasks.
