Project: Content Optimization



Today, the Web is a major channel for distributing information from various sources. Sites like Yahoo!, Google News, MSN, YouTube, Digg, and news feeds like those from the Associated Press and the Washington Post, provide users with a wide range of choices to keep up with diverse content in a timely fashion. However, while content availability has provided users with more choices, there is little quality control on what gets delivered, making it difficult for users to select the right content in a given context. In fact, one of the major challenges for content publishers and aggregators is to select the best and most relevant content to attract and retain users.

Yahoo! Labs’ Audience Sciences team is working with Yahoo! Research and multiple engineering and editorial teams to build a large-scale content optimization system to serve the right content to the right user at the right time. The Yahoo! Labs team consists of Deepak Agarwal, Bee-Chung Chen, Wei Chu, Pradheep Elango, Seung-Taek Park, Raghu Ramakrishnan, and Liang Zhang.

Displaying appropriate content to users that visit a high traffic site like Yahoo! is a challenging task. Generally, human editors program content that ensures high quality and gives the site a unique editorial "voice." But editorial programming is expensive to scale as the number of articles and pages grows, while an algorithmic approach allows Yahoo! to aggressively optimize for quantifiable and immediately measurable metrics and to scale inexpensively.

Complete automation might, however, lead to poor user experience. The ideal solution is one that blends the strengths of the editorial and algorithmic approaches, and optimizes content programming within high-level constraints set by editors to address holistic issues such as voice. The Audience Sciences team in Yahoo! Labs has taken a first step towards such constrained optimization of content, with novel engineering and scientific solutions that are at the core of the content optimization system, and it has already proven successful with content optimization of the “Today” module on the Yahoo! Front Page.

“Content optimization has had a huge impact on Yahoo! Front Page,” notes Pradheep Elango, technical lead for the content optimization project. “We have seen consistent gains in click-through rates, and the quality of the content has improved due to editors leveraging real-time feedback while programming.”

The system works by collecting and processing large volumes of data efficiently and frequently, updating accurate statistical models and serving content under strict latency requirements. The system includes a dashboard that provides a flexible monitoring tool to assess performance, and also provides valuable feedback to editors that leads to better content programming.

Working with Yahoo! Research, the Audience Sciences team delivered some key scientific innovations in the content optimization system. The team developed precise statistical methods that can accurately predict user reaction to a displayed article.

The team also developed novel sequential methods to track article quality over time, both in terms of overall popularity and affinity to individual users. In fact, the highly dynamic setting of this problem makes it more challenging than a traditional recommender system problem: articles have short lifetimes (6-24 hours), the pool of available articles is constantly changing, the user population is dynamic, and each article has different click-through rates (CTRs) at different times of day or when shown in different slots in our module.

The team learned very early that methods based solely on article features (e.g., title, keywords, category tags), though predictive, are far inferior compared to methods that learn individual article-specific models. They therefore concentrated on methods that focus on granular item-level behavior and developed a series of novel dynamic models suitable for optimization of Web content.

Personalization was another important feature. The team developed new state-of-the-art methods to perform personalization – serving content to users based on known information (e.g., from registration data) and behavioral traits inferred through their past activity (e.g., sports fan). Some of this research has been published at WWW09, and more will be published at KDD 09.

Content optimization is also continuously improving, according to Yahoo! Research scientist Bee-Chung Chen. “One ongoing research direction is making online experimentation more efficient and scalable,” said Chen. “Another area is better personalization. We want to serve each user with articles that the user would like most.”