SD LG ASJun 26, 2023

Mono-to-stereo through parametric stereo generation

Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle

arXiv:2306.14647v17.27 citationsh-index: 35

Originality Incremental advance

AI Analysis

This work addresses the challenge of realistic spatial audio generation from mono signals, which is incremental as it builds on existing parametric stereo and generative modeling techniques.

The paper tackles the problem of converting monophonic audio to stereophonic audio by predicting parametric stereo parameters using nearest neighbor and deep network approaches, and shows that generative models outperform non-generative counterparts in this framework.

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work positions both PS and generative modelling as strong and appealing methodologies for mono-to-stereo upmixing. A discussion of the limitations of these approaches is also provided.

View on arXiv PDF

Similar