The Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset and the Multimedia Commons initiative are featured in the February edition of the Communications of the ACM magazine. The dataset and initiative aim to open and share community-contributed dataset features and ground truth annotations.
We are very proud to announce the public release of the largest-ever machine learning dataset to the research community. The Yahoo News Feed dataset stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.
Just in time for the holidays, we've researched the best way to improve the search ranking and relevance of temporal photos on the photo-sharing site Flickr.
Recently we publicly released Anthelion, a focused crawler for semantic annotations in Web pages that steers in the direction of HTML pages–which are annotated with markup languages like RDFa, Microformats, and Microdata–to GitHub.
As part of our continued effort to support cutting-edge scientific research in academia, Yahoo recently gifted 400 servers to three leading university computer science and engineering departments.
Our Answering Complex Queries group in Haifa developed an automatic quality scoring system for the Yahoo Answers CQA site. Answers are now ranked by relevance. They explain how their science powers the product.
The Yahoo Labs Personalization Science team has developed a new, scalable recommendation system that allows researchers to model complex interaction features including side information on users, items and context, without sacrificing scalability.
Our Search Systems team in Haifa, along with Yahoo Search in Sunnyvale, brings you the second installment of their blog series on Omid, an open source transaction processing system for Apache HBase.
I am spending this year on sabbatical at Yahoo Labs after 27 years at Stony Brook. It is interesting to experience the life at a major technology company, and I am sure that much of what I learn here will inform my teaching and research when I return there in Fall 2016.
Visiting Senior Principal Research Scientist Dragomir Radev recently coached the U.S. linguistics team to multiple wins at the International Linguistics Olympiad in Blagoevgrad, Bulgaria.