RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
This work addresses face-voice association for multilingual applications, but it is incremental as it builds on existing fusion and projection techniques.
The paper tackled the face-voice association task in a multilingual environment by revisiting fusion and orthogonal projection to focus on relevant semantic information, achieving an EER of 33.1 and ranking 3rd in the FAME 2026 challenge.
Face-voice association in multilingual environment challenge 2026 aims to investigate the face-voice association task in multilingual scenario. The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase. To this end, we revisit the fusion and orthogonal projection for face-voice association by effectively focusing on the relevant semantic information within the two modalities. Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.