Deep Forward and Inverse Perceptual Models for Tracking and Prediction
This work addresses the problem of improving visual perception for tracking and prediction in robotics, representing an incremental advancement by combining deep networks with existing frameworks.
The paper tackles learning forward models that generate high-dimensional images from state and inverse models that estimate state from images in robotics, showing that their deep perceptual model outperforms standard deconvolutional methods and GANs by producing clear, photo-realistic images, and validates it on a real robotic system.
We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.