Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks
This work addresses scene parsing for computer vision applications, but it is incremental as it combines existing methods like convolutional sparse coding and resonator networks.
The authors tackled the problem of visual scene analysis and recognition by encoding sparse latent features from images into high-dimensional vectors and factorizing them to parse scene content, resulting in a system that increases representation capacity and reduces collisions in combinatorial search, with the resonator network achieving fast and accurate factorization.
We propose a system for visual scene analysis and recognition based on encoding the sparse, latent feature-representation of an image into a high-dimensional vector that is subsequently factorized to parse scene content. The sparse feature representation is learned from image statistics via convolutional sparse coding, while scene parsing is performed by a resonator network. The integration of sparse coding with the resonator network increases the capacity of distributed representations and reduces collisions in the combinatorial search space during factorization. We find that for this problem the resonator network is capable of fast and accurate vector factorization, and we develop a confidence-based metric that assists in tracking the convergence of the resonator network.