CVDec 18, 2024

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

arXiv:2412.13859v110 citationsh-index: 6ICPR
Originality Incremental advance
AI Analysis

This addresses the problem of reducing annotation costs for document classification, but it is incremental as it builds on existing LLM capabilities.

The paper tackled document image classification by exploring zero-shot prompting and few-shot fine-tuning with large language models to reduce reliance on human-annotated training samples, achieving competitive performance on benchmark datasets like RVL-CDIP with minimal data.

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes