SDApr 13, 2015

Absolute Geometry Calibration of Distributed Microphone Arrays in an Audio-Visual Sensor Network

arXiv:1504.03128v15 citations
Originality Incremental advance
AI Analysis

This addresses the need for precise sensor alignment in joint audio-visual tracking, which is incremental as it builds on existing self-localization methods.

The paper tackles the problem of aligning microphone and camera coordinate systems in audio-visual sensor networks by proposing two calibration techniques based on audio-visual correlates, with the joint calibration method achieving an overall error of 0.20m in reverberant environments.

Joint audio-visual speaker tracking requires that the locations of microphones and cameras are known and that they are given in a common coordinate system. Sensor self-localization algorithms, however, are usually separately developed for either the acoustic or the visual modality and return their positions in a modality specific coordinate system, often with an unknown rotation, scaling and translation between the two. In this paper we propose two techniques to determine the positions of acoustic sensors in a common coordinate system, based on audio-visual correlates, i.e., events that are localized by both, microphones and cameras separately. The first approach maps the output of an acoustic self-calibration algorithm by estimating rotation, scale and translation to the visual coordinate system, while the second solves a joint system of equations with acoustic and visual directions of arrival as input. The evaluation of the two strategies reveals that joint calibration outperforms the mapping approach and achieves an overall calibration error of 0.20m even in reverberant environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes