PreCNet: Next-Frame Video Prediction Based on Predictive Coding
This work addresses video prediction for applications like autonomous driving, though it is incremental as it adapts an existing neuroscience model to a known task.
The authors tackled next-frame video prediction by adapting a neuroscience predictive coding model into a deep learning framework (PreCNet), achieving state-of-the-art performance on a benchmark with improvements in MSE, PSNR, and SSIM, especially when using a larger training set.
Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.