CVAug 15, 2022Code
Pyramidal Predictive Network: A Model for Visual-frame Prediction Based on Predictive Coding TheoryChaofan Ling, Junpei Zhong, Weihua Li
Visual-frame prediction is a pixel-dense prediction task that infers future frames from past frames. Lacking of appearance details, low prediction accuracy and high computational overhead are still major problems with current models or methods. In this paper, we propose a novel neural network model inspired by the well-known predictive coding theory to deal with the problems. Predictive coding provides an interesting and reliable computational framework, which will be combined with other theories such as the cerebral cortex at different level oscillates at different frequencies, to design an efficient and reliable predictive network model for visual-frame prediction. Specifically, the model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams, respectively. The update frequency of neural units on each of the layer decreases with the increasing of network levels, which results in neurons of higher-level can capture information in longer time dimensions. According to the experimental results, this model shows better compactness and comparable predictive performance with existing works, implying lower computational cost and higher prediction accuracy. Code is available at https://github.com/Ling-CF/PPNet.
CVDec 22, 2022Code
Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video PredictionChaofan Ling, Junpei Zhong, Weihua Li
We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN.
CVJan 13, 2023
Anti-aliasing Predictive Coding Network for Future Video Frame PredictionChaofan Ling, Weihua Li, Junpei Zhong
We introduce here a predictive coding based model that aims to generate accurate and sharp future frames. Inspired by the predictive coding hypothesis and related works, the total model is updated through a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. Most importantly, We propose and improve several artifacts to ensure that the neural networks generate clear and natural frames. Different inputs are no longer simply concatenated or added, they are calculated in a modulated manner to avoid being roughly fused. The downsampling and upsampling modules have been redesigned to ensure that the network can more easily construct images from Fourier features of low-frequency inputs. Additionally, the training strategies are also explored and improved to generate believable results and alleviate inconsistency between the input predicted frames and ground truth. Our proposals achieve results that better balance pixel accuracy and visualization effect.