Predictive Coding for Dynamic Vision : Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model
This work addresses the challenge of dynamic vision processing for AI systems, though it appears incremental as it builds on existing predictive coding frameworks.
The authors tackled the problem of generating and recognizing dynamic visual patterns by developing a predictive multiple spatio-temporal scales RNN (P-MSTRNN) model, which learned from human movement patterns and achieved robust performance through error minimization.
The current paper presents a novel recurrent neural network model, the predictive multiple spatio-temporal scales RNN (P-MSTRNN), which can generate as well as recognize dynamic visual patterns in the predictive coding framework. The model is characterized by multiple spatio-temporal scales imposed on neural unit dynamics through which an adequate spatio-temporal hierarchy develops via learning from exemplars. The model was evaluated by conducting an experiment of learning a set of whole body human movement patterns which was generated by following a hierarchically defined movement syntax. The analysis of the trained model clarifies what types of spatio-temporal hierarchy develop in dynamic neural activity as well as how robust generation and recognition of movement patterns can be achieved by using the error minimization principle.