SDMay 18

EnvTriCascade: An Environment-Aware Tri-Stage Cascaded Framework for ESDD2 2026 Challenge

Hengyan Huang, Xiaoxuan Guo, Jiayi Zhou, Yuankun Xie, Jian Liu, Haonan Cheng, Long Ye, Qin Zhang

arXiv:2605.1840959.4

AI Analysis

For researchers in audio spoofing detection, this work addresses the emerging challenge of component-level manipulation in real-world scenarios, though it is an incremental improvement over existing methods.

The paper tackles component-level audio spoofing detection where speech and environmental sounds can be independently manipulated. The proposed EnvTriCascade framework achieves a Macro-F1 score of 0.8266 on the test set, significantly outperforming the official baseline and ranking second in the ESDD2 Challenge.

ADD in real-world scenarios has evolved from speech-only spoofing to more challenging component-level settings, where speech and environmental sounds may be independently manipulated. To tackle this, we propose EnvTriCascade, an Environment-Aware Tri-Stage Cascaded framework for the ESDD2 Challenge. First, a mix-consistency detector provides a binary prior to distinguish original recordings from manipulated mixtures, which calibrates the final decisions. Next, two complementary five-class detectors, leveraging SSLAM+XLS-R and EAT-large+XLS-R representations, extract robust multi-branch features integrated via a cross-branch attention-gated classifier. To enhance robustness against diverse mixing conditions, we incorporate RawBoost augmentation. Trained exclusively on the official CompSpoofV2 dataset, our system achieves a Macro-F1 score of 0.8266 on the test set, significantly outperforming the official baseline and ranking second in the challenge.

View on arXiv PDF

Similar