LGDec 28, 2020

Detecting Anomalous Invoice Line Items in the Legal Case Lifecycle

arXiv:2012.14511v33 citations
AI Analysis

This work aims to improve the efficiency and accuracy of legal invoice review for corporate legal departments, which currently struggle with undetected discrepancies despite existing safeguards. This is an incremental improvement to an existing process.

This paper addresses the problem of detecting anomalous invoice line items in legal invoices, a significant pain point for corporate legal departments. The authors apply several machine learning model architectures to identify discrepancies based on the suitability of line items within a legal case's lifecycle, achieving this by generating a synthetic dataset to overcome the lack of labeled data.

The United States is the largest distributor of legal services in the world, representing a $437 billion market. Of this, corporate legal departments pay law firms $80 billion for their services. Every month, legal departments receive and process invoices from these law firms and legal service providers. Legal invoice review is and has been a pain point for corporate legal department leaders. Complex and intricate, legal invoices often contain several hundred line-items that account for anything from tasks such as hands-on legal work to expenses such as copying, meals, and travel. The man-hours and scrutiny involved in the invoice review process can be overwhelming. Even with common safeguards in place, such as established billing guidelines, experienced invoice reviewers (typically highly paid in-house attorneys), and rule-based electronic billing tools ("e-billing"), many discrepancies go undetected. Using machine learning, our goal is to demonstrate the current flaws of, and to explore improvements to, the legal invoice review process for invoices submitted by law firms to their corporate clients. In this work, we detail our approach, applying several machine learning model architectures, for detecting anomalous invoice line-items based on their suitability in the legal case's lifecycle (modeled using a set of case-level and invoice line-item-level features). To overcome the challenge of unlabeled data, we generate a synthetic dataset which utilizes subject matter expertise ("SME") to manipulate existing records' attributes to reflect an anomalous state in the product lifecycle, and characterize our method's performance using a set of model architectures. We demonstrate how this process advances solving anomaly detection problems, specifically when the characteristics of the anomalies are well known, and offer lessons learned from applying our approach to real-world data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes