Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition
This work addresses feature selection for soundscape emotion recognition, which is incremental as it applies graph learning techniques to a known dataset.
The authors tackled feature selection for soundscape emotion recognition by developing a graph learning framework with a novel information criterion, resulting in a sparse graph representation that revealed a strong connection between arousal and valence, challenging common assumptions in the field.
Research on soundscapes has shifted the focus of environmental acoustics from noise levels to the perception of sounds, incorporating contextual factors. Soundscape emotion recognition (SER) models perception using a set of features, with arousal and valence commonly regarded as sufficient descriptors of affect. In this work, we blend \emph{graph learning} techniques with a novel \emph{information criterion} to develop a feature selection framework for SER. Specifically, we estimate a sparse graph representation of feature relations using linear structural equation models (SEM) tailored to the widely used Emo-Soundscapes dataset. The resulting graph captures the relations between input features and the two emotional outputs. To determine the appropriate level of sparsity, we propose a novel \emph{generalized elbow detector}, which provides both a point estimate and an uncertainty interval. We conduct an extensive evaluation of our methods, including visualizations of the inferred relations. While several of our findings align with previous studies, the graph representation also reveals a strong connection between arousal and valence, challenging common SER assumptions.