SDAIOct 25, 2025

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching

arXiv:2510.22439v21 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work solves the challenge of flexible, high-quality RIR synthesis for applications in virtual reality, architectural acoustics, and audio production, representing a strong specific gain rather than a foundational advancement.

The paper tackled the problem of generating room impulse responses (RIRs) for immersive virtual acoustic environments by addressing dataset scarcity and multimodal input limitations, resulting in RIRs with 8.8% mean RT60 error compared to -37% for baselines and improved perceptual quality.

Room impulse response (RIR) generation remains a critical challenge for creating immersive virtual acoustic environments. Current methods suffer from two fundamental limitations: the scarcity of full-band RIR datasets and the inability of existing models to generate acoustically accurate responses from diverse input modalities. We present PromptReverb, a two-stage generative framework that addresses these challenges. Our approach combines a variational autoencoder that upsamples band-limited RIRs to full-band quality (48 kHz), and a conditional diffusion transformer model based on rectified flow matching that generates RIRs from descriptions in natural language. Empirical evaluation demonstrates that PromptReverb produces RIRs with superior perceptual quality and acoustic accuracy compared to existing methods, achieving 8.8% mean RT60 error compared to -37% for widely used baselines and yielding more realistic room-acoustic parameters. Our method enables practical applications in virtual reality, architectural acoustics, and audio production where flexible, high-quality RIR synthesis is essential.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes