PredNet and Predictive Coding: A Critical Review
This work provides a critical review for researchers in neuroscience-inspired AI and self-supervised learning, identifying limitations in existing models to guide future improvements, though it is incremental as it builds on prior PredNet extensions.
The paper critically analyzes PredNet, a deep predictive coding network for video, by evaluating its adherence to predictive coding theory and performance as a self-supervised model, showing that it does not fully follow predictive coding principles and that top-down conditioning improves synthetic data but fails on complex real-world datasets.
PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory.