Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification
This work addresses weakly-labelled audio classification for applications like computer-assisted audio analysis, but it is incremental as it builds on existing attention-based methods.
The paper tackles the problem of quantifying the importance of audio segments in weakly-labelled classification by proposing a method that adapts to user-defined viewpoints without retraining, and introduces RELNET, a neural network that uses this relevance measure to achieve competitive results on the DCASE2018 dataset.
We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the classifiers used in each specific classification problem, the relevance measure adapts to different user-defined viewpoints without requiring additional neural network training. This characteristic allows the relevance measure to highlight audio segments that quickly adapt to user-defined criteria. Such functionality can be used for computer-assisted audio analysis. Also, we propose a neural network architecture, namely RELNET, that leverages the relevance measure for weakly-labelled audio classification problems. RELNET was evaluated in the DCASE2018 dataset and achieved competitive classification results when compared to previous attention-based proposals.