IR LG SIJan 24, 2020

The Enron Corpus: Where the Email Bodies are Buried?

arXiv:2001.10374v15 citations

Originality Synthesis-oriented

AI Analysis

This work addresses fraud detection and legal compliance in corporate email data, but it is incremental as it applies existing methods to a specific dataset.

The researchers tackled the problem of analyzing the Enron email corpus for fraud indicators by applying machine learning to four investigative tasks, achieving peak accuracies of 95.7% in identifying persons of interest and 99% in flagging legally responsive emails, and discovering 50,000 previously unreported instances of personally identifiable information.

To probe the largest public-domain email database for indicators of fraud, we apply machine learning and accomplish four investigative tasks. First, we identify persons of interest (POI), using financial records and email, and report a peak accuracy of 95.7%. Secondly, we find any publicly exposed personally identifiable information (PII) and discover 50,000 previously unreported instances. Thirdly, we automatically flag legally responsive emails as scored by human experts in the California electricity blackout lawsuit, and find a peak 99% accuracy. Finally, we track three years of primary topics and sentiment across over 10,000 unique people before, during and after the onset of the corporate crisis. Where possible, we compare accuracy against execution times for 51 algorithms and report human-interpretable business rules that can scale to vast datasets.

View on arXiv PDF

Similar