SDAILGASJan 26, 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

arXiv:2501.15417v339 citationsh-index: 12Has CodeIEEE Transactions on Audio, Speech, and Language Processing
Originality Highly original
AI Analysis

This addresses the problem of versatile and efficient voice enhancement for audio processing applications, representing a novel method rather than an incremental improvement.

The paper tackles voice enhancement for both speech and singing by introducing AnyEnhance, a unified generative model that handles multiple tasks like denoising and target speaker extraction simultaneously without fine-tuning, achieving superior performance in objective metrics and subjective tests.

We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance. An open-source implementation is provided at https://github.com/viewfinder-annn/anyenhance-v1-ccf-aatc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes