SDLGASMar 10, 2023

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Peking U
arXiv:2303.05678v110 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses misclassification and localization issues in sound event detection for audio processing applications, representing an incremental improvement by applying causal intervention to an existing bottleneck.

The paper tackles the problem of co-occurrence confounders in weakly supervised sound event detection, which cause misclassification and biased localization, by proposing a causal intervention method that removes these confounders to improve event boundary clarity. Experiments show the method effectively improves performance on multiple datasets and generalizes to various baseline models.

Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i.e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision. To tackle this issue, we first establish a structural causal model (SCM) to reveal that the context is the main cause of co-occurrence confounders that mislead the model to learn spurious correlations between frames and clip-level labels. Based on the causal analysis, we propose a causal intervention (CI) method for WSSED to remove the negative impact of co-occurrence confounders by iteratively accumulating every possible context of each class and then re-projecting the contexts to the frame-level features for making the event boundary clearer. Experiments show that our method effectively improves the performance on multiple datasets and can generalize to various baseline models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes