SDLGASDec 18, 2025

Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders

arXiv:2512.16519v1h-index: 20
Originality Incremental advance
AI Analysis

This provides a practical solution for speech synthesis users needing pitch control, but it is incremental as it builds on existing vocoder frameworks.

The paper tackles the problem of pitch modification for mel-based neural vocoders by introducing a cepstrum-based method that shifts harmonic structures without requiring model retraining, validated with objective and subjective metrics.

This paper introduces a cepstrum-based pitch modification method that can be applied to any mel-spectrogram representation. As a result, this method is compatible with any mel-based vocoder without requiring any additional training or changes to the model. This is achieved by directly modifying the cepstrum feature space in order to shift the harmonic structure to the desired target. The spectrogram magnitude is computed via the pseudo-inverse mel transform, then converted to the cepstrum by applying DCT. In this domain, the cepstral peak is shifted without having to estimate its position and the modified mel is recomputed by applying IDCT and mel-filterbank. These pitch-shifted mel-spectrogram features can be converted to speech with any compatible vocoder. The proposed method is validated experimentally with objective and subjective metrics on various state-of-the-art neural vocoders as well as in comparison with traditional pitch modification methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes