SD LGAug 20, 2015

Histogram of gradients of Time-Frequency Representations for Audio scene detection

arXiv:1508.04909v19.525 citations

Originality Incremental advance

AI Analysis

This addresses audio scene detection, an incremental improvement for applications like environmental monitoring or smart devices.

The paper tackles audio scene classification by proposing a novel feature based on histogram of gradients (HOG) of time-frequency representations, which outperforms state-of-the-art competitors on multiple datasets, including a new public dataset with 19 classes and about 900 minutes of audio.

This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene. Contrarily to classical audio features like MFCC, we make the hypothesis that histogram of gradients are able to encode some relevant informations in a time-frequency {representation:} namely, the local direction of variation (in time and frequency) of the signal spectral power. In addition, in order to gain more invariance and robustness, histogram of gradients are locally pooled. We have evaluated the relevance of {the novel feature} by comparing its performances with state-of-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves $19$ classes and contains about $900$ minutes of audio scene recording. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that our HOG-based features outperform its competitors

View on arXiv PDF

Similar