Leslie Wöhler

h-index11
2papers

2 Papers

CVJul 26, 2024
MangaUB: A Manga Understanding Benchmark for Large Multimodal Models

Hikaru Ikuta, Leslie Wöhler, Kiyoharu Aizawa

Manga is a popular medium that combines stylized drawings and text to convey stories. As manga panels differ from natural images, computational systems traditionally had to be designed specifically for manga. Recently, the adaptive nature of modern large multimodal models (LMMs) shows possibilities for more general approaches. To provide an analysis of the current capability of LMMs for manga understanding tasks and identifying areas for their improvement, we design and evaluate MangaUB, a novel manga understanding benchmark for LMMs. MangaUB is designed to assess the recognition and understanding of content shown in a single panel as well as conveyed across multiple panels, allowing for a fine-grained analysis of a model's various capabilities required for manga understanding. Our results show strong performance on the recognition of image content, while understanding the emotion and information conveyed across multiple panels is still challenging, highlighting future work towards LMMs for manga understanding.

CVSep 24, 2025
PerFace: Metric Learning in Perceptual Facial Similarity for Enhanced Face Anonymization

Haruka Kumagai, Leslie Wöhler, Satoshi Ikehata et al.

In response to rising societal awareness of privacy concerns, face anonymization techniques have advanced, including the emergence of face-swapping methods that replace one identity with another. Achieving a balance between anonymity and naturalness in face swapping requires careful selection of identities: overly similar faces compromise anonymity, while dissimilar ones reduce naturalness. Existing models, however, focus on binary identity classification "the same person or not", making it difficult to measure nuanced similarities such as "completely different" versus "highly similar but different." This paper proposes a human-perception-based face similarity metric, creating a dataset of 6,400 triplet annotations and metric learning to predict the similarity. Experimental results demonstrate significant improvements in both face similarity prediction and attribute-based face classification tasks over existing methods.