Publication

Collaborative Filtering and the Missing at Random Assumption

Source:

IJCAI (2007)

URL:

http://cobweb.ecn.purdue.edu/~malcolm/yahoo/Marlin2007(UserBiasUncertainty).pdf

Abstract:

Rating prediction is an important application and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rests on the assumption that missing rating data is missing at random (MAR); this is often violated for real data. In this paper we present the results of a user survey and study, in which we collect a random sample of ratings from current users of an online radio service. In the survey, a large number of users report they believe their opinion of a song does a rate that song, a violation of the MAR condition. We collected a true random sample of more than 300,000 song ratings from more than 30,000 users. An analysis of this data shows that the sample of random ratings has markedly different properties than ratings from user-selected songs. Finally, we present experimental results which show that learning an explicit model of the missing data mechanism with an informative prior can lead to a large improvement in prediction performance on the random sample of ratings.