Web Information Management - Information extraction

By Malcolm Slaney


Users of the web are drawn to multimedia content like never before. The content is exciting, compelling, and interesting. It is quite likely we have already passed the point where the average Internet user consumes more multimedia data than they do text. Flickr alone has more than 2 billion pictures, a sizable fraction of all the images on the web. There are untold hours of videos, waiting to be consumed. More than 2 million songs. We're drowning in multimedia data. Hurray!

How do people find the content they want to see and hear? Multimedia data is more interesting than text both because of the type of information available and how it is consumed. This real-world data is noisy. It's quite likely that your image of a picture tagged with the word "Christmas" is very different from the image imagined by somebody living on the other side of the world. Both images are correctly tagged Christmas - neither is wrong. There is only a small amount of text to describe an object that might range from seconds to hours, and might have hundreds of different scenes. Perhaps, more importantly, people search and consume multimedia data for different reasons then text. No longer does the common measure of precision-recall accurately reflect success. People consume multimedia for information, but more often for entertainment.

We want to understand how to encourage people to contribute, and help people find the Internet multimedia that makes them smile. Whether it is to answer an informational query, or to entertain them, Yahoo wants to provide the answer. Here are some key challenges in the new age of Internet multimedia.

Challenges

  • How do we make use of noisy descriptions that accompany multimedia data. In a global Internet, there are many different ways to describe the same content. How do we leverage the noisy data we have to do something useful?

  • How do we delight, entertain and inform users? How do we help a user who starts looking for information but then is diverted to consuming entertainment? How can we tell whether a user is looking for entertainment or information?

  • How do we combine both content-based retrieval (a not-very successful approach for search from the 20th century) with tags (a more successful approach) to do something better than either approach alone for multimedia retrieval?

  • How do we determine the most relevant images to return? Multimedia doesn't come with in-links. Flickr has more than three billion images of the Golden Gate Bridge? Which 10 should be shown to the user on the first page?

  • How can we add the wisdom of the crowds to manage multimedia content? What new services are possible? How can we use social filtering to serve appropriate multimedia content to different audiences? (e.g. keep adult content away from people who don't want to see it.) How do we help people consume the long tail? How do we show people the content that nobody else has seen?

  • How do we understand and do something reasonable for the user with long videos. As our attention span gets shorter, how do we find the piece of content that users really want? Is it different for information-seekers and entertainment-seeking users?

  • How do we understand a web-scale corpus of media? How do we find similar or related content---across music, image, and video databases---when we have billions of objects? What kind of algorithms scale to the web? What does web-scale intelligence look like?