CVDec 18, 2024

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

Anna Scius-Bertrand, Michael Jungo, Lars Vögtlin, Jean-Marc Spat, Andreas Fischer

arXiv:2412.13859v18.710 citationsh-index: 6ICPR

Originality Incremental advance

AI Analysis

This addresses the problem of reducing annotation costs for document classification, but it is incremental as it builds on existing LLM capabilities.

The paper tackled document image classification by exploring zero-shot prompting and few-shot fine-tuning with large language models to reduce reliance on human-annotated training samples, achieving competitive performance on benchmark datasets like RVL-CDIP with minimal data.

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

View on arXiv PDF

Similar