Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
This addresses the problem of poor scene reconstruction in AR/VR applications like virtual conferencing for users dealing with challenging surfaces, though it appears incremental as it builds on existing audio-visual techniques.
The paper tackles the challenge of reconstructing 3D scenes with reflective and textureless surfaces, which often cause depth discontinuities and holes, by proposing Echo-Reconstruction, an audio-visual method that uses sound reflections to enhance geometry and audio reconstruction, resulting in high success rates for material classification and depth estimation with considerable visual and audio improvements.
Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience. The mobile phone prototype emits pulsed audio, while recording video for RGB-based 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, VR demo, and experimental results from real-world and virtual scenes with challenging surfaces and sound indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable visual and audio improvement in 3D scenes (see Figure 1).