Inference for Generative Capsule Models
This work addresses the challenge of improving inference accuracy in generative capsule models for computer vision tasks, though it appears incremental as it builds on existing methods.
The authors tackled the problem of inferring object transformations and part assignments in capsule networks by specifying a generative model and deriving a variational inference algorithm. Their results show significant outperformance over stacked capsule autoencoders on constellations data.
Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge and reason about the relationship between an object and its parts. In this paper we specify a \emph{generative} model for such data, and derive a variational algorithm for inferring the transformation of each object and the assignments of observed parts to the objects. We apply this model to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. [2019] has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).