CLMay 11

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

arXiv:2605.1055091.8Has Code
Predicted impact top 25% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers and practitioners in document intelligence, this benchmark provides a more realistic evaluation setting, but it is an incremental contribution as it is a new benchmark rather than a novel method or paradigm.

Existing document classification benchmarks are oversimplified, lacking hierarchical, multi-modal, and cross-domain complexity. The authors introduce MMM-Bench, a benchmark with a five-level taxonomy and 5,990 real-world multi-modal documents from 12 domains, and establish baselines identifying four fundamental challenges.

Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete hierarchical path by domain experts. We establish comprehensive baselines on MMM-Bench, which consists of open-weight models and API-based models. Through systematic experiments, we identify four fundamental challenges within MMM-Bench and propose corresponding insights. To provide a solid foundation for advancing research in multi-level, multi-domain document classification, we release all of the data and the evaluation toolkit of MMM-Bench at https://github.com/MMMDC-Bench/MMMDC-Bench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes