A Few-Shot Learning Approach for Sound Source Distance Estimation Using Relation Networks
This work addresses the challenge of calibrating microphone systems for SSDE in unknown environments, offering a practical solution for audio-based applications, though it is incremental as it builds on existing few-shot learning techniques.
The paper tackles the problem of sound source distance estimation (SSDE) by applying few-shot learning with relation networks to overcome the mismatch between training and test environments, showing that this approach outperforms traditional methods like XGBoost, SVM, CNN, and MLP in comparative experiments.
In this paper, we study the performance of few-shot learning, specifically meta learning empowered few-shot relation networks, over supervised deep learning and conventional machine learning approaches in the problem of Sound Source Distance Estimation (SSDE). In previous research on deep supervised SSDE, low accuracies have often resulted from the mismatch between the training data (from known environments) and the test data (from unknown environments). By performing comparative experiments on a sufficient amount of data, we show that the few-shot relation network outperforms other competitors including eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Convolutional Neural Network (CNN), and MultiLayer Perceptron (MLP). Hence it is possible to calibrate a microphone-equipped system, with a few labeled samples of audio recorded in a particular unknown environment to adjust and generalize our classifier to the possible input data and gain higher accuracies.