AS AI LG SDMay 20, 2025

SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification

arXiv:2505.14561v23.33 citationsh-index: 17Has CodeINTERSPEECH

Originality Incremental advance

AI Analysis

This addresses a bottleneck in self-supervised speaker verification for improving robustness to recording conditions, though it is an incremental method building on existing SSL frameworks.

The paper tackles the limitation of same-utterance positive sampling in self-supervised speaker verification, which encodes channel information, by proposing Self-Supervised Positive Sampling (SSPS) to find positives of the same speaker but different recording conditions, resulting in EERs of 2.57% and 2.53% on VoxCeleb1-O and a 58% EER reduction for SimCLR.

Self-Supervised Learning (SSL) has led to considerable progress in Speaker Verification (SV). The standard framework uses same-utterance positive sampling and data-augmentation to generate anchor-positive pairs of the same speaker. This is a major limitation, as this strategy primarily encodes channel information from the recording condition, shared by the anchor and positive. We propose a new positive sampling technique to address this bottleneck: Self-Supervised Positive Sampling (SSPS). For a given anchor, SSPS aims to find an appropriate positive, i.e., of the same speaker identity but a different recording condition, in the latent space using clustering assignments and a memory queue of positive embeddings. SSPS improves SV performance for both SimCLR and DINO, reaching 2.57% and 2.53% EER, outperforming SOTA SSL methods on VoxCeleb1-O. In particular, SimCLR-SSPS achieves a 58% EER reduction by lowering intra-speaker variance, providing comparable performance to DINO-SSPS.

View on arXiv PDF Code

Similar