CLAIOct 15, 2025

StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

arXiv:2510.13194v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of maintaining prosodic cues like emphasis in speech-to-speech translation, which is important for applications requiring accurate paralinguistic communication, though it appears incremental as it builds on existing methods with a focus on data efficiency.

The paper tackled the problem of preserving word-level emphasis in speech-to-speech translation by developing a stress-aware system that uses LLMs for cross-lingual emphasis conversion and a controllable TTS model, resulting in substantial outperformance over baselines in emphasis preservation while maintaining translation quality, speaker intent, and naturalness.

We propose a stress-aware speech-to-speech translation (S2ST) system that preserves word-level emphasis by leveraging LLMs for cross-lingual emphasis conversion. Our method translates source-language stress into target-language tags that guide a controllable TTS model. To overcome data scarcity, we developed a pipeline to automatically generate aligned training data and introduce the "LLM-as-Judge" for evaluation. Experiments show our approach substantially outperforms baselines in preserving emphasis while maintaining comparable translation quality, speaker intent, and naturalness. Our work highlights the importance of prosody in translation and provides an effective, data-efficient solution for preserving paralinguistic cues in S2ST.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes