ASSDSPMar 27

DiffAU: Diffusion-Based Ambisonics Upscaling

arXiv:2510.0018070.3h-index: 46
AI Analysis

Improves spatial audio realism for VR/AR applications by enhancing Ambisonics resolution without requiring high-order hardware.

DiffAU uses diffusion models to upscale first-order Ambisonics to third-order, achieving strong objective and perceptual performance in anechoic multi-speaker settings.

Spatial audio enhances immersion by reproducing 3D sound fields, with Ambisonics offering a scalable format for this purpose. While first-order Ambisonics (FOA) notably facilitates hardware-efficient acquisition and storage of sound fields as compared to high-order Ambisonics (HOA), its low spatial resolution limits realism, highlighting the need for Ambisonics upscaling (AU) as an approach for increasing the order of Ambisonics signals. In this work we propose DiffAU, a cascaded AU method that leverages recent developments in diffusion models combined with novel adaptation to spatial audio to generate 3rd order Ambisonics from FOA. By learning data distributions, DiffAU provides a principled approach that rapidly and reliably reproduces HOA in various settings. Experiments in anechoic conditions with multiple speakers, show strong objective and perceptual performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes