CVAug 6, 2025

Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

arXiv:2508.04592v25 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of multimodal systems for bilingual populations, but it is incremental as it focuses on a specific multilingual condition within an existing research area.

The paper tackles the problem of face-voice association in multilingual environments by introducing the FAME 2026 Challenge, which uses the MAV-Celeb dataset to explore this scenario, with results including baseline models and task details for evaluation.

The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, audio-visual systems are among the most widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to the presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) 2026 Challenge focuses on exploring face-voice association under the unique condition of a multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenarios. The challenge uses a dataset named Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baseline models, and task details for the FAME Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes