Arctic-TILT. Business Document Understanding at Sub-Billion Scale
This addresses the need for efficient and cost-effective document understanding in enterprise environments, though it is incremental in scaling down existing methods.
The paper tackles the problem of answering questions from PDF or scan content using large language models, achieving accuracy comparable to models 1000 times larger while being fine-tunable and deployable on a single 24GB GPU for processing up to 400k tokens.
The vast portion of workloads employing LLMs involves answering questions grounded on PDF or scan content. We introduce the Arctic-TILT achieving accuracy on par with models 1000$\times$ its size on these use cases. It can be fine-tuned and deployed on a single 24GB GPU, lowering operational costs while processing Visually Rich Documents with up to 400k tokens. The model establishes state-of-the-art results on seven diverse Document Understanding benchmarks, as well as provides reliable confidence scores and quick inference, which are essential for processing files in large-scale or time-sensitive enterprise environments.