Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder
This addresses the problem of tracking multiple objects without labeled data for researchers in computer vision, though it appears incremental as it builds on existing probabilistic models.
The paper tackles unsupervised multi-object tracking by proposing DVAE-UMOT, a model based on a dynamical variational autoencoder, which competes with and surpasses two state-of-the-art probabilistic models in performance.
In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT. The DVAE is a latent-variable deep generative model that can be seen as an extension of the variational autoencoder for the modeling of temporal sequences. It is included in DVAE-UMOT to model the objects' dynamics, after being pre-trained on an unlabeled synthetic dataset of single-object trajectories. Then the distributions and parameters of DVAE-UMOT are estimated on each multi-object sequence to track using the principles of variational inference: Definition of an approximate posterior distribution of the latent variables and maximization of the corresponding evidence lower bound of the data likehood function. DVAE-UMOT is shown experimentally to compete well with and even surpass the performance of two state-of-the-art probabilistic MOT models. Code and data are publicly available.