SDMay 18

EnvTriCascade: An Environment-Aware Tri-Stage Cascaded Framework for ESDD2 2026 Challenge

arXiv:2605.1840959.4
AI Analysis

For researchers in audio spoofing detection, this work addresses the emerging challenge of component-level manipulation in real-world scenarios, though it is an incremental improvement over existing methods.

The paper tackles component-level audio spoofing detection where speech and environmental sounds can be independently manipulated. The proposed EnvTriCascade framework achieves a Macro-F1 score of 0.8266 on the test set, significantly outperforming the official baseline and ranking second in the ESDD2 Challenge.

ADD in real-world scenarios has evolved from speech-only spoofing to more challenging component-level settings, where speech and environmental sounds may be independently manipulated. To tackle this, we propose EnvTriCascade, an Environment-Aware Tri-Stage Cascaded framework for the ESDD2 Challenge. First, a mix-consistency detector provides a binary prior to distinguish original recordings from manipulated mixtures, which calibrates the final decisions. Next, two complementary five-class detectors, leveraging SSLAM+XLS-R and EAT-large+XLS-R representations, extract robust multi-branch features integrated via a cross-branch attention-gated classifier. To enhance robustness against diverse mixing conditions, we incorporate RawBoost augmentation. Trained exclusively on the official CompSpoofV2 dataset, our system achieves a Macro-F1 score of 0.8266 on the test set, significantly outperforming the official baseline and ranking second in the challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes