CLAICVMay 20

Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

arXiv:2605.2118289.9
Predicted impact top 33% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers working on manga understanding, this provides a cleaner, more accurate dataset, but the contribution is incremental as it primarily fixes existing annotations.

The authors identified and corrected approximately 29,000 annotation errors in the Manga109 dataset, producing Manga109-v2026 that better aligns with modern OCR and multimodal manga understanding tasks.

Manga is a culturally distinctive multimodal medium and one of the most influential forms of Japanese popular culture. As AI systems increasingly target manga understanding, OCR, and translation, Manga109 has become a foundational dataset for manga-related AI research. However, the current Manga109 dataset contains transcription errors and coarse annotations, which do not align well with modern OCR and multimodal manga understanding tasks. In this work, we revisit the dialogue text annotations of Manga109 and identify five categories of annotation issues, including transcription errors, missing text regions, overlapping dialogue and onomatopoeia, and under-segmented speech balloons. To address these issues, we combine OCR-based issue detection and manual revision to construct Manga109-v2026, revising approximately 29,000 dialogue annotations. Our revisions better align Manga109 with modern OCR and multimodal manga understanding systems while preserving expressive structures characteristic of manga.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes