CLAug 24, 2017

CloudScan - A configuration-free invoice analysis system using recurrent neural networks

arXiv:1708.07403v190 citations
Originality Incremental advance
AI Analysis

This addresses the need for automated invoice processing for businesses, offering a configuration-free solution that generalizes to new layouts, though it is incremental as it builds on existing neural network methods.

The paper tackles the problem of invoice analysis without configuration or annotation by learning a single global model using recurrent neural networks, achieving an average F1 score of 0.840 on unseen invoice layouts compared to a baseline of 0.788.

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes