CVMMApr 14, 2025

XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark

arXiv:2504.10258v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses layout ordering for document image understanding, which is incremental but important for enhancing Retrieval-Augmented Generation and LLM preprocessing.

The paper tackles document reading order recovery for complex layouts like multi-column newspapers by introducing XY-Cut++, which integrates pre-mask processing, multi-granularity segmentation, and cross-modal matching. It achieves state-of-the-art performance with 98.8 BLEU overall, outperforming baselines by up to 24% on the new DocBench-100 dataset.

Document Reading Order Recovery is a fundamental task in document image understanding, playing a pivotal role in enhancing Retrieval-Augmented Generation (RAG) and serving as a critical preprocessing step for large language models (LLMs). Existing methods often struggle with complex layouts(e.g., multi-column newspapers), high-overhead interactions between cross-modal elements (visual regions and textual semantics), and a lack of robust evaluation benchmarks. We introduce XY-Cut++, an advanced layout ordering method that integrates pre-mask processing, multi-granularity segmentation, and cross-modal matching to address these challenges. Our method significantly enhances layout ordering accuracy compared to traditional XY-Cut techniques. Specifically, XY-Cut++ achieves state-of-the-art performance (98.8 BLEU overall) while maintaining simplicity and efficiency. It outperforms existing baselines by up to 24\% and demonstrates consistent accuracy across simple and complex layouts on the newly introduced DocBench-100 dataset. This advancement establishes a reliable foundation for document structure recovery, setting a new standard for layout ordering tasks and facilitating more effective RAG and LLM preprocessing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes