SDLGASDec 1, 2021

Environmental Sound Extraction Using Onomatopoeic Words

arXiv:2112.00209v411 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of precise sound extraction for applications like audio processing, though it appears incremental by adapting existing U-Net methods to a novel input type.

The paper tackles the problem of extracting specific environmental sounds from mixtures by using onomatopoeic words to specify targets, and it shows that the proposed method outperforms conventional sound-event class-based approaches.

An onomatopoeic word, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeic words to specify the target sound to be extracted. By this method, we estimate a time-frequency mask from an input mixture spectrogram and an onomatopoeic word using a U-Net architecture, then extract the corresponding target sound by masking the spectrogram. Experimental results indicate that the proposed method can extract only the target sound corresponding to the onomatopoeic word and performs better than conventional methods that use sound-event classes to specify the target sound.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes