Emotion-Aware Speech Generation with Character-Specific Voices for Comics
This work addresses the need for interactive and immersive comic reading experiences by automating voiceover generation, though it appears incremental as it combines existing modules like image processing, LLMs, and TTS.
The paper tackles the problem of generating character-specific, emotion-aware speech from comics by developing an end-to-end pipeline that processes full comic volumes to produce speech aligned with dialogue and emotional states, achieving automated voiceover generation for comics.
This paper presents an end-to-end pipeline for generating character-specific, emotion-aware speech from comics. The proposed system takes full comic volumes as input and produces speech aligned with each character's dialogue and emotional state. An image processing module performs character detection, text recognition, and emotion intensity recognition. A large language model performs dialogue attribution and emotion analysis by integrating visual information with the evolving plot context. Speech is synthesized through a text-to-speech model with distinct voice profiles tailored to each character and emotion. This work enables automated voiceover generation for comics, offering a step toward interactive and immersive comic reading experience.