SDAIMMMay 27

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

arXiv:2605.2810170.6Has Code
Predicted impact top 26% in SD · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenging problem of predicting spatially varying Room Impulse Responses from sparse observations for immersive spatial audio rendering.

EigeNet achieves state-of-the-art few-shot novel view RIR prediction and sim-to-real generalization by using a geometry-informed multi-modal framework with a Cross-view Alternate-attention Transformer and an auxiliary multi-task loss.

Predicting spatially varying Room Impulse Response (RIR) from sparse observations is a critical but highly challenging inverse problem for immersive spatial audio rendering. In this work, we present EIGENET, a geometry-informed multi-modal framework for few-shot novel view RIR prediction. At its core is a Cross-view Alternate-attention Transformer that iteratively refines local intra-view acoustic structures and global cross-view spatial relationships. We empirically demonstrate that this architecture is capable of making full use of the multi-view multi-modal context while performing spatial-temporal reasoning for RIR prediction. Inspired by acoustic ray tracing, we design a geometry-informed modulation block to formulate the connection between geometric features and RIR power spectrum. In the mean time, an auxiliary loss is introduced to transform the single-target waveform prediction into a multi-task learning framework. Through ablation studies, we demonstrate that this design yields consistent performance gains regardless of the underlying backbone, thereby confirming its foundational utility and architecture-agnostic generalizability for RIR prediction task. Evaluated on both simulated and real-world benchmarks, EIGENET achieves both state-of-the-art performance in few-shot novel view RIR prediction and sim-to-real generalization. Codes and checkpoints are available on https://github.com/FEAfeatherTHER/EigeNet.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes