ASAISDAug 31, 2023

ReZero: Region-customizable Sound Extraction

arXiv:2308.16892v141 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the need for flexible sound extraction in applications like audio processing, though it appears incremental as it builds on existing methods like BSRNN.

The paper tackles the problem of extracting target sounds from specific, user-defined spatial regions in multi-channel audio, introducing the ReZero framework which demonstrates effectiveness on simulated and real-recorded data.

We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region are typically assumed. The spatial region can be defined as an angular window, a sphere, a cone, or other geometric patterns. Being a solution to the R-SE task, the proposed ReZero framework includes (1) definitions of different types of spatial regions, (2) methods for region feature extraction and aggregation, and (3) a multi-channel extension of the band-split RNN (BSRNN) model specified for the R-SE task. We design experiments for different microphone array geometries, different types of spatial regions, and comprehensive ablation studies on different system configurations. Experimental results on both simulated and real-recorded data demonstrate the effectiveness of ReZero. Demos are available at https://innerselfm.github.io/rezero/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes