SDCLASJul 12, 2025

Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning

arXiv:2507.09310v11 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating intelligible speech for hearing loss and noisy conditions, but it is incremental as it builds on existing voice conversion techniques.

The paper tackled the problem of transferring Lombard speaking style via voice conversion to improve speech intelligibility without requiring extensive recorded data, achieving comparable intelligibility gains with implicit conditioning while preserving speaker similarity.

Text-to-Speech (TTS) systems in Lombard speaking style can improve the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique to train TTS systems in the absence of recorded data from the target speaker in the target speaking style. In this paper, we are concerned with Lombard speaking style transfer. Our goal is to convert speaker identity while preserving the acoustic attributes that define the Lombard speaking style. We compare voice conversion models with implicit and explicit acoustic feature conditioning. We observe that our proposed implicit conditioning strategy achieves an intelligibility gain comparable to the model conditioned on explicit acoustic features, while also preserving speaker similarity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes