Privacy
By Ashwin Machanavajjhala, Sarah Herr
As a popular destination on the Web, millions of users interact with Yahoo! web services every day. These interactions contain valuable cues that can be used to personalize the user’s web experience. This makes the user activity logs Yahoo!’s prime asset. At the same time care should be taken that a user’s privacy should not be breached by either consciously or inadvertently sharing sensitive user information with other users or third parties without the user’s consent. We at Yahoo! strive to build novel algorithms to create personalized user experiences while guaranteeing the privacy of users.
Challenges
Privacy Disclosures in Ad Targeting and Social Recommendations
Many companies create elaborate behavioral profiles of users and use these to recommend ads, products and friends (on a social network). Some of these recommendations could cause disclosure of sensitive information. For instance, the fact that an ad is shown to user discloses private information about the user to the advertiser - recent work has shown that one can create micro-targeted ads to infer sensitive attributes of users [Korolova et "Privacy Violations Using Microtargeted Ads: A Case Study"]. What makes this a credible risk is the fact that any user could create micro-targeted ads. In a social network setting, a recommendation made to a user based on the profiles of her friends could also cause information disclosure [Machanavajjhala et al, "Personalized Social Recommendations - Accurate or Private?"] Given the increasing push to recommend friends, products, jobs, etc. based on your friends' activities makes this a credible risk. The challenge here is to come up with solutions that allow targeting and recommendations without disclosing sensitive information.
Publishing snapshots of user activity logs
Academic research on challenging problems related to search, advertising and modeling user behavior on the web have been handicapped by the lack of access to real logs of user activity on the web. Naïve attempts at publishing, say, search logs, have led to disastrous consequences (like in the case of AOL in 2006). This is because obfuscating user identifiers from the data alone does not guarantee privacy; one can predict identity of individuals from their search queries, clicks, reviews, ratings, etc. While recent work (Jones et al "I know what you did last summer: query logs and user privacy", Narayanan et al "How to Break Anonymity of the Netflix Prize Dataset", Korolova et al "Releasing Search Queries and Clicks Privately", Goetz et al "Privacy in Search Logs") has initiated research on this problem, there is still much to be done both in terms of the quality of data and the types of data that can be privately published.
Tracking and mining user activity logs privately
Web service companies have the capability to track and store user histories at a very detailed level and for long periods of time. With the advent of smart phones, we now are able to track users activity and locations at much higher granularity than before. With the widespread use of machine learning techniques many user facing decisions are based on these logs. However, there are three key challenges.
- Data retention laws require companies to limit the size of the histories maintained per user (Yahoo! is allowed to retain un-anonymized logs of user searches for a period of only 18 months, and IP addresses are deleted within 6 months). Can we build machine learning models that can completely obviate the need to store un-anonymized user histories (solving the data retention problem)?
- Many decisions derived from user logs could leak personal information, especially when used in conjunction with their social interactions. For instance, an application that recommends applications/products to a user based on social connections could disclose sensitive information about the connection. The challenge here is to characterize when privacy can be breached and how to design privacy preserving applications.
- There are many scenarios when user activity is shared either with other users (user location/status message is shared with his/her friends) or with other third parties (user profiles are shared for advertising and with other third party apps). We have seen many privacy concerns raised when such information is aggressively shared to enable social and advertising applications. The challenges here are to develop a sound and intuitive language for users to specify access control restrictions and minimizing the amount of data required to be shared for enabling social and advertising applications.
"Fenced" Access Control and Privacy Disclosures in a Linked World
It is increasingly becoming harder for users to track the information that they share with other users. Hence, creating usable tools for access control and providing useful privacy disclosures is essential for customer satisfaction.
- Many users don't realize that photos come with geo-tags, and cameras (esp. those on phones) have the geo-tag associated with a photo by default. Hence, sharing a photo would essentially disclose where you were at a specific time. This is not advisable especially when your location is your home, or your child's school. However, you might want to disclose the location when the photo was taken while traveling.
Flickr's Geofences has been very well received as a tool for users to establish their own comfort zone with respect to privacy. Geofences gives us a solution to several privacy issues. It gives users a chance to become familiar with their own privacy settings and understand how their own cameras can transfer location data. Geofences also gives users the opportunity to fine tune their own privacy restrictions on a proactive and retroactive basis. What are other possible applications of privacy 'fences' in areas of social sharing, such as sharing user information from profiles, activity, locations, and relationships, while simultaneously educating users about other aspects of their privacy and user data? What other applications of Geofences or other privacy fences come to mind? - Today's user manages their online presence in a multitude of ways. For a company like Yahoo! with a product and services across multiple platforms, it can be very challenging to maintain user identity across websites and across devices while providing privacy settings appropriate for each situation. Would it be conceivable to use a 'Geofence' type solution to help users navigate across different media without impinging on their ease of use?
- The online universe continues to expand exponentially. Accordingly, companies are developing partnerships and relationships in order to combine resources and maximize customer value. Managing user privacy while sharing resources is of critical importance and a delicate balance. Privacy practices are generally dictated by a company privacy policy which provides general guidelines. Outside of those general guidelines, what solutions come to mind when developing systems and protections to balance privacy permissions with keeping viable user information between companies and partnerships? For instance, could one detect whether privacy policies of one company may override the privacy protection guaranteed by another in such a partnership? If yes, how can this effectively communicated to the user.
Disclosure Screens in a Shrinking Environment
One of our growing challenges is how best to provide relevant and complete privacy disclosures on the mobile screen (where both screen space and user patience to read text is minimized). For instance, if Facebook were to change the way they use location information, how can they notify users who only log in to Facebook via the iPhone? Facebook cannot customize the iOS permission screen to say 'New' or 'Updated'. Facebook can leverage the iOS provided disclosure screen, but it would only ask "Do you allow Facebook to use your location?" without communicating that Facebook is using the location information in a new way. The user wouldn't know to question their actions and Facebook is limited in their means to communicate.
