Seeing Things in Random-Dot Videos
This addresses the challenge of interpreting noisy dynamic visual data, such as from ultrasound imaging, which is incremental as it builds on existing a contrario frameworks for human perception modeling.
The paper tackled the problem of detecting and grouping objects in random-dot videos, where per-frame information is sparse and noisy, by proposing a new algorithm based on temporal integration and spatial statistical tests. The algorithm achieved performance strikingly similar to human perception in psychophysical experiments, with only two parameters: time integration and visual angle.
Humans possess an intricate and powerful visual system in order to perceive and understand the environing world. Human perception can effortlessly detect and correctly group features in visual data and can even interpret random-dot videos induced by imaging natural dynamic scenes with highly noisy sensors such as ultrasound imaging. Remarkably, this happens even if perception completely fails when the same information is presented frame by frame rather than in a video sequence. We study this property of surprising dynamic perception with the first goal of proposing a new detection and spatio-temporal grouping algorithm for such signals when, per frame, the information on objects is both random and sparse and embedded in random noise. The algorithm is based on the succession of temporal integration and spatial statistical tests of unlikeliness, the a contrario framework. The algorithm not only manages to handle such signals but the striking similarity in its performance to the perception by human observers, as witnessed by a series of psychophysical experiments on image and video data, leads us to see in it a simple computational Gestalt model of human perception with only two parameters: the time integration and the visual angle for candidate shapes to be detected.