AS LG SDJun 10, 2020

Listen to What You Want: Neural Network-based Universal Sound Selector

Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki

arXiv:2006.05712v114.569 citations

Originality Incremental advance

AI Analysis

This addresses the need for more controllable hearable devices by enabling selective listening to acoustic events, though it is incremental as it builds on existing sound processing methods.

The paper tackles the problem of acoustic event sound selection or removal from mixtures by proposing a neural network that directly extracts or suppresses sounds from user-specified target classes, achieving promising performance and generalization to unseen numbers of sources.

Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or suppression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source separation followed by AE classification, this is a sub-optimal way of solving the problem. Moreover, source separation usually requires knowing the maximum number of sources, which may not be practical when dealing with AEs. In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes. The proposed framework can be explicitly optimized to simultaneously select sounds from multiple desired AE classes, independently of the number of sources in the mixture. We experimentally show that the proposed method achieves promising AE sound selection performance and could be generalized to mixtures with a number of sources that are unseen during training.

View on arXiv PDF

Similar