Search for Concepts: Discovering Visual Concepts Using Direct Optimization
This addresses a key bottleneck in computer vision for enabling symbolic reasoning, though it appears incremental by integrating optimization elements into existing practices.
The paper tackles the problem of unsupervised image decomposition into objects by proposing direct optimization instead of amortized inference, showing it improves generalizability, reduces missed decompositions, and requires less data.
Finding an unsupervised decomposition of an image into individual objects is a key step to leverage compositionality and to perform symbolic reasoning. Traditionally, this problem is solved using amortized inference, which does not generalize beyond the scope of the training data, may sometimes miss correct decompositions, and requires large amounts of training data. We propose finding a decomposition using direct, unamortized optimization, via a combination of a gradient-based optimization for differentiable object properties and global search for non-differentiable properties. We show that using direct optimization is more generalizable, misses fewer correct decompositions, and typically requires less data than methods based on amortized inference. This highlights a weakness of the current prevalent practice of using amortized inference that can potentially be improved by integrating more direct optimization elements.