Learning Entity Types from Query Logs Via Graph-Based Modeling

Publication
Oct 19, 2015
Abstract

Entities (e.g., person, movie or place) play an important role in real-world applications so learning entity types has attracted much attention in recent years. Most conventional automatic techniques use large corpora, such as news articles, to learn types of entities. However, such text corpora focus on general knowledge about entities in an objective way. Hence, it is difficult to satisfy those users with specific and personalized needs for an entity. Recent years have witnessed an explosive expansion in the mining of search query logs, which contain billions of entities. The word patterns and clickthroughs in search logs are not found in text corpora, thus providing a complemental source for discovering entity types based on user behaviors. In this paper, we study the problem of learning entity types from search query logs. However, it is a non-trivial task because: (1) Queries are short texts and information related to entities is usually very sparse; (2) Large amounts of irrelevant information exists in search logs, bringing noise in detecting entity types. In order to address the issues, we first model query logs into a bipartite graph with entities and their auxiliary information, such as contextual words and clicked URLs. Then a graph-based framework, ELP, is proposed to simultaneously learn types of both entities and auxiliary signals. In ELP, two separate strategies LPA and LPD are designed to fix the problems of sparsity and noise in query logs. Extensive empirical studies are conducted on real Yahoo! search logs to evaluate the effectiveness of the proposed ELP framework. 

  • The 24th ACM International Conference on Information and Knowledge Management (CIKM 2015)
  • Conference/Workshop Paper

BibTeX