Learning Disentangled Audio Representations through Controlled Synthesis
This addresses the problem of evaluating disentanglement techniques in audio for researchers, but it is incremental as it focuses on dataset creation and benchmarking.
The paper tackled the lack of benchmarking data for disentangled auditory representation learning by introducing SynTone, a synthetic dataset with explicit ground truth factors, and used it to evaluate state-of-the-art methods, revealing their strengths and limitations.
This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.