CVSep 26, 2025

Rule-Based Reinforcement Learning for Document Image Classification with Vision Language Models

arXiv:2509.22283v1h-index: 4Has CodeICDAR
Originality Synthesis-oriented
AI Analysis

This work addresses generalization challenges in document analysis for researchers and practitioners, though it appears incremental as it adapts an existing method to a new domain.

The paper tackles document image classification by applying rule-based reinforcement learning, finding that it improves generalization to out-of-distribution data across images, classes, and modalities.

Rule-based reinforcement learning has been gaining popularity ever since DeepSeek-R1 has demonstrated its success through simple verifiable rewards. In the domain of document analysis, reinforcement learning is not as prevalent, even though many downstream tasks may benefit from the emerging properties of reinforcement learning, particularly the enhanced reason capabilities. We study the effects of rule-based reinforcement learning with the task of Document Image Classification which is one of the most commonly studied downstream tasks in document analysis. We find that reinforcement learning tends to have better generalisation capabilities to out-of-distritbution data, which we examine in three different scenarios, namely out-of-distribution images, unseen classes and different modalities. Our code is available at https://github.com/jungomi/vision-finetune.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes