IRAIOct 22, 2023

MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language

arXiv:2311.02083v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses retrieval problems for Manga readers and researchers, but it is incremental as it builds on existing vision-language methods.

The paper tackles the challenge of content retrieval in Manga due to its visual complexity by presenting MaRU, a multi-staged system that connects vision and language for efficient search of dialogues and scenes, achieving promising results in evaluations.

Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision and language to facilitate efficient search of both dialogues and scenes within Manga frames. The architecture of MaRU integrates an object detection model for identifying text and frame bounding boxes, a Vision Encoder-Decoder model for text recognition, a text encoder for embedding text, and a vision-text encoder that merges textual and visual information into a unified embedding space for scene retrieval. Rigorous evaluations reveal that MaRU excels in end-to-end dialogue retrieval and exhibits promising results for scene retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes