Threading Machine Generated Email

Publication
Feb 4, 2013
Abstract

Abstract: Viewing email messages as parts of a sequence or a thread is a convenient way to quickly understand their context. Current threading techniques rely on purely syntactic methods, matching sender information, subject line, and reply/forward prefixes. As such, they are mostly limited to personal conversations. In contrast, machine-generated email, which amount, as per our experiments, to more than 60% of the overall email traffic, requires a different kind of threading that should reflect how a sequence of emails is caused by a few related user actions. For example, purchasing goods from an online store will result in a receipt or a confirmation message, which may be followed, possibly after a few days, by a shipment notification message from an express shipping service. In today's mail systems, they will not be a part of the same thread, while we believe they should. In this paper, we focus on this type of threading that we coin ``causal threading". We demonstrate that, by analyzing recurring patterns over hundreds of millions of mail users, we can infer a causality relation between these two individual messages. In addition, by observing multiple causal relations over common messages, we can generate "causal threads" over a sequence of messages. The four key stages of our approach consist of: (1)  identifying  messages that are instances of the same email type or "template" (generated by the same machine process on the sender side)   (2) building a causal graph, in which nodes correspond to email templates and edges indicate potential causal relations (3) learning a causal relation prediction function, and (4) automatically "threading" the incoming email stream. We present detailed experimental results obtained by analyzing the inboxes of 12.5 million Yahoo Mail users, who voluntarily opted-in for such research. Supervised editorial judgments show that we can identify more than 70%  (recall rate) of all "causal threads" at a precision level of 90%. In addition, for a search scenario we show that we achieve a precision close to 80% at 90% recall. We believe that supporting causal threads in email clients opens new grounds for improving both email search and browsing experiences.

  • WSDM 2013
  • Conference/Workshop Paper

BibTeX