Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks
This work addresses video analysis for computer vision applications, but it appears incremental as it builds on existing methods like Deep Predictive Coding Networks and Winner-Take-All Autoencoders.
The authors tackled unsupervised feature learning in multi-dimensional time series by proposing a convolutional recurrent neural network with Winner-Take-All dropout, achieving better results than comparable methods like Deep Predictive Coding Networks in object recognition with temporal context in videos.
We propose a convolutional recurrent neural network, with Winner-Take-All dropout for high dimensional unsupervised feature learning in multi-dimensional time series. We apply the proposedmethod for object recognition with temporal context in videos and obtain better results than comparable methods in the literature, including the Deep Predictive Coding Networks previously proposed by Chalasani and Principe.Our contributions can be summarized as a scalable reinterpretation of the Deep Predictive Coding Networks trained end-to-end with backpropagation through time, an extension of the previously proposed Winner-Take-All Autoencoders to sequences in time, and a new technique for initializing and regularizing convolutional-recurrent neural networks.