CVMar 24

A Synchronized Audio-Visual Multi-View Capture System

Xiangwei Shi, Era Dorta Perez, Ruud de Jong, Ojas Shirekar, Chirag Raman

arXiv:2603.2308946.7h-index: 8

Predicted impact top 73% in CV · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the problem of studying conversational interactions with precise timing for researchers, but it is incremental as it builds on existing multi-view systems by adding audio capabilities.

The paper tackles the lack of audio support and synchronization in multi-view capture systems by developing a system that integrates synchronized audio and video, showing that it provides temporally consistent recordings for fine-grained analysis of conversation behavior.

Multi-view capture systems have been an important tool in research for recording human motion under controlling conditions. Most existing systems are specified around video streams and provide little or no support for audio acquisition and rigorous audio-video alignment, despite both being essential for studying conversational interaction where timing at the level of turn-taking, overlap, and prosody matters. In this technical report, we describe an audio-visual multi-view capture system that addresses this gap by treating synchronized audio and synchronized video as first-class signals. The system combines a multi-camera pipeline with multi-channel microphone recording under a unified timing architecture and provides a practical workflow for calibration, acquisition, and quality control that supports repeatable recordings at scale. We quantify synchronization performance in deployment and show that the resulting recordings are temporally consistent enough to support fine-grained analysis and data-driven modeling of conversation behavior.

View on arXiv PDF

Similar