CVAug 1, 2024

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Oxford
arXiv:2408.00298v112 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses accessibility for visually impaired individuals by enabling engagement with manga, which is inherently visual, through automated transcription.

The paper tackles the problem of generating accessible dialogue transcripts for manga chapters automatically, focusing on narrative consistency by detecting text, classifying it as essential or non-essential, and attributing dialogues to speakers with consistent character names, achieving significantly higher precision in speaker diarisation over prior works.

Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes