SPFeb 26
Scattering Transform for Auditory Attention DecodingRené Pallenberg, Fabrice Katzberg, Alfred Mertins et al.
The use of hearing aids will increase in the coming years due to demographic change. One open problem that remains to be solved by a new generation of hearing aids is the cocktail party problem. A possible solution is electroencephalography-based auditory attention decoding. This has been the subject of several studies in recent years, which have in common that they use the same preprocessing methods in most cases. In this work, in order to achieve an advantage, the use of a scattering transform is proposed as an alternative to these preprocessing methods. The two-layer scattering transform is compared with a regular filterbank, the synchrosqueezing short-time Fourier transform and the common preprocessing. To demonstrate the performance, the known and the proposed preprocessing methods are compared for different classification tasks on two widely used datasets, provided by the KU Leuven (KUL) and the Technical University of Denmark (DTU). Both established and new neural-network-based models, CNNs, LSTMs, and recent Transformer/graph-based models are used for classification. Various evaluation strategies were compared, with a focus on the task of classifying speakers who are unknown from the training. We show that the two-layer scattering transform can significantly improve the performance for subject-related conditions, especially on the KUL dataset. However, on the DTU dataset, this only applies to some of the models, or when larger amounts of training data are provided, as in 10-fold cross-validation. This suggests that the scattering transform is capable of extracting additional relevant information.
SDMar 14, 2017
Audio Scene Classification with Deep Recurrent Neural NetworksHuy Phan, Philipp Koch, Fabrice Katzberg et al.
We introduce in this work an efficient approach for audio scene classification using deep recurrent neural networks. An audio scene is firstly transformed into a sequence of high-level label tree embedding feature vectors. The vector sequence is then divided into multiple subsequences on which a deep GRU-based recurrent neural network is trained for sequence-to-label classification. The global predicted label for the entire sequence is finally obtained via aggregation of subsequence classification outputs. We will show that our approach obtains an F1-score of 97.7% on the LITIS Rouen dataset, which is the largest dataset publicly available for the task. Compared to the best previously reported result on the dataset, our approach is able to reduce the relative classification error by 35.3%.
SDDec 29, 2016
What Makes Audio Event Detection Harder than Classification?Huy Phan, Philipp Koch, Fabrice Katzberg et al.
There is a common observation that audio event classification is easier to deal with than detection. So far, this observation has been accepted as a fact and we lack of a careful analysis. In this paper, we reason the rationale behind this fact and, more importantly, leverage them to benefit the audio event detection task. We present an improved detection pipeline in which a verification step is appended to augment a detection system. This step employs a high-quality event classifier to postprocess the benign event hypotheses outputted by the detection system and reject false alarms. To demonstrate the effectiveness of the proposed pipeline, we implement and pair up different event detectors based on the most common detection schemes and various event classifiers, ranging from the standard bag-of-words model to the state-of-the-art bank-of-regressors one. Experimental results on the ITC-Irst dataset show significant improvements to detection performance. More importantly, these improvements are consistent for all detector-classifier combinations.
SDSep 29, 2016
Measurement of Sound Fields Using Moving MicrophonesFabrice Katzberg, Radoslaw Mazur, Marco Maass et al.
The sampling of sound fields involves the measurement of spatially dependent room impulse responses, where the Nyquist-Shannon sampling theorem applies in both the temporal and spatial domain. Therefore, sampling inside a volume of interest requires a huge number of sampling points in space, which comes along with further difficulties such as exact microphone positioning and calibration of multiple microphones. In this paper, we present a method for measuring sound fields using moving microphones whose trajectories are known to the algorithm. At that, the number of microphones is customizable by trading measurement effort against sampling time. Through spatial interpolation of the dynamic measurements, a system of linear equations is set up which allows for the reconstruction of the entire sound field inside the volume of interest.