Yoav Schechner

2papers

2 Papers

CVJul 15, 2024
Success Probability in Multi-View Imaging

Vadim Holodovsky, Masada Tzabari, Yoav Schechner et al.

Platforms such as robots, security cameras, drones and satellites are used in multi-view imaging for three-dimensional (3D) recovery by stereoscopy or tomography. Each camera in the setup has a field of view (FOV). Multi-view analysis requires overlap of the FOVs of all cameras, or a significant subset of them. However, the success of such methods is not guaranteed, because the FOVs may not sufficiently overlap. The reason is that pointing of a camera from a mount or platform has some randomness (noise), due to imprecise platform control, typical to mechanical systems, and particularly moving systems such as satellites. So, success is probabilistic. This paper creates a framework to analyze this aspect. This is critical for setting limitations on the capabilities of imaging systems, such as resolution (pixel footprint), FOV, the size of domains that can be captured, and efficiency. The framework uses the fact that imprecise pointing can be mitigated by self-calibration - provided that there is sufficient overlap between pairs of views and sufficient visual similarity of views. We show an example considering the design of a formation of nanosatellites that seek 3D reconstruction of clouds.

SDAug 12, 2014
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

Antoine Deleforge, Radu Horaud, Yoav Schechner et al.

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.