Kevin J Shih

h-index29
1paper

1 Paper

SDJan 20, 2025
A2SB: Audio-to-Audio Schrodinger Bridges

Zhifeng Kong, Kevin J Shih, Weili Nie et al.

Real-world audio is often degraded by numerous factors. This work presents an audio restoration model tailored for high-res music at 44.1kHz. Our model, Audio-to-Audio Schrödinger Bridges (A2SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end requiring no vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. A2SB is capable of achieving state-of-the-art band-width extension and inpainting quality on several out-of-distribution music test sets.