Capsule Routing for Sound Event Detection
This addresses the problem of detecting environmental sound events for audio analysis applications, but it is incremental as it applies an existing capsule routing method to a new domain.
The paper tackles sound event detection by using a capsule routing neural network to classify events and estimate their timing, achieving a state-of-the-art F-score of 58.6% on the DCASE 2017 challenge and reducing overfitting.
The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing a set of properties for an entity of interest, and the purpose of routing is to identify part-whole relationships between capsules. That is, a capsule in one layer is assumed to belong to a capsule in the layer above in terms of the entity being represented. Using capsule routing, we wish to train a network that can learn global coherence implicitly, thereby improving generalization performance. Our proposed method is evaluated on Task 4 of the DCASE 2017 challenge. Results show that classification performance is state-of-the-art, achieving an F-score of 58.6%. In addition, overfitting is reduced considerably compared to other architectures.