SDCLASAug 10, 2022

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

arXiv:2208.05359v24 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses the problem of generating audiobooks with varied reading styles across different speakers, though it appears incremental as it builds on existing style transfer methods.

The paper tackles cross-speaker reading style transfer for audiobook datasets, which lack utterance-level style labels, by proposing a chunk-wise multi-scale model that captures global genre and local prosody while disentangling speaker timbre. The model successfully transfers reading styles to new target speakers, with results demonstrating its potential for multi-speaker audiobook generation.

Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local prosody and global genre, and are rarely accompanied by utterance-level style labels. Thus, properly transferring the reading style across different speakers remains a challenging task. This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches. Moreover, by disentangling speaker timbre and style with the proposed switchable adversarial classifiers, the extracted reading style is made adaptable to the timbre of different speakers. Experiment results confirm that the model manages to transfer a given reading style to new target speakers. With the support of local prosody and global genre type predictor, the potentiality of the proposed method in multi-speaker audiobook generation is further revealed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes