Predictive Sparse Manifold Transform
This work addresses the challenge of unsupervised prediction of visual stimuli, offering a biologically plausible framework, but it appears incremental as it builds on existing sparse coding and manifold learning techniques.
The paper tackled the problem of predicting future frames in natural videos by introducing the Predictive Sparse Manifold Transform (PSMT), which uses sparse coding and manifold learning to create a dynamic embedding space, achieving better prediction performance compared to static baseline methods.
We present Predictive Sparse Manifold Transform (PSMT), a minimalistic, interpretable and biologically plausible framework for learning and predicting natural dynamics. PSMT incorporates two layers where the first sparse coding layer represents the input sequence as sparse coefficients over an overcomplete dictionary and the second manifold learning layer learns a geometric embedding space that captures topological similarity and dynamic temporal linearity in sparse coefficients. We apply PSMT on a natural video dataset and evaluate the reconstruction performance with respect to contextual variability, the number of sparse coding basis functions and training samples. We then interpret the dynamic topological organization in the embedding space. We next utilize PSMT to predict future frames compared with two baseline methods with a static embedding space. We demonstrate that PSMT with a dynamic embedding space can achieve better prediction performance compared to static baselines. Our work establishes that PSMT is an efficient unsupervised generative framework for prediction of future visual stimuli.