CLSDASMay 10, 2025

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

arXiv:2505.06660v13 citationsh-index: 44Has CodeICASSP
Originality Incremental advance
AI Analysis

This addresses a practical problem for researchers and developers in speech processing by providing a benchmark for evaluating models in more realistic, challenging multi-speaker environments, though it is incremental as it builds on existing SSL benchmarks.

The paper tackles the lack of benchmarks for self-supervised learning models in target-speaker tasks under noisy, multi-talker conditions by introducing TS-SUPERB, a benchmark with four tasks, and finds that performance in these scenarios cannot be easily inferred from single-speaker tasks, with joint optimization across tasks showing effectiveness.

Self-supervised learning (SSL) models have significantly advanced speech processing tasks, and several benchmarks have been proposed to validate their effectiveness. However, previous benchmarks have primarily focused on single-speaker scenarios, with less exploration of target-speaker tasks in noisy, multi-talker conditions -- a more challenging yet practical case. In this paper, we introduce the Target-Speaker Speech Processing Universal Performance Benchmark (TS-SUPERB), which includes four widely recognized target-speaker processing tasks that require identifying the target speaker and extracting information from the speech mixture. In our benchmark, the speaker embedding extracted from enrollment speech is used as a clue to condition downstream models. The benchmark result reveals the importance of evaluating SSL models in target speaker scenarios, demonstrating that performance cannot be easily inferred from related single-speaker tasks. Moreover, by using a unified SSL-based target speech encoder, consisting of a speaker encoder and an extractor module, we also investigate joint optimization across TS tasks to leverage mutual information and demonstrate its effectiveness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes