Differentiable and Stable Long-Range Tracking of Multiple Posterior Modes
This addresses the problem of robust multimodal tracking in AI/robotics, offering a novel solution with strong empirical gains.
The paper tackles the problem of tracking multiple posterior modes in high-dimensional observations like images by developing a discriminative particle filter that learns particle-based representations of uncertainty via deep neural network encoders. The result is a method that achieves dramatic improvements in accuracy and stability on challenging tracking and robot localization problems.
Particle filters flexibly represent multiple posterior modes nonparametrically, via a collection of weighted samples, but have classically been applied to tracking problems with known dynamics and observation likelihoods. Such generative models may be inaccurate or unavailable for high-dimensional observations like images. We instead leverage training data to discriminatively learn particle-based representations of uncertainty in latent object states, conditioned on arbitrary observations via deep neural network encoders. While prior discriminative particle filters have used heuristic relaxations of discrete particle resampling, or biased learning by truncating gradients at resampling steps, we achieve unbiased and low-variance gradient estimates by representing posteriors as continuous mixture densities. Our theory and experiments expose dramatic failures of existing reparameterization-based estimators for mixture gradients, an issue we address via an importance-sampling gradient estimator. Unlike standard recurrent neural networks, our mixture density particle filter represents multimodal uncertainty in continuous latent states, improving accuracy and robustness. On a range of challenging tracking and robot localization problems, our approach achieves dramatic improvements in accuracy, while also showing much greater stability across multiple training runs.