Learning Canonical Transformations
This work addresses the problem of enabling neural networks to learn generalizable geometric transformations, which is a foundational problem for improving generalization in computer vision models.
This paper explores inductive biases for neural networks to learn canonical geometric transformations like translation and rotation in pixel space. They found that high training set diversity allows translation to extrapolate to unseen shapes and scales, and an iterative training scheme significantly extrapolates rotation over time.
Humans understand a set of canonical geometric transformations (such as translation and rotation) that support generalization by being untethered to any specific object. We explore inductive biases that help a neural network model learn these transformations in pixel space in a way that can generalize out-of-domain. Specifically, we find that high training set diversity is sufficient for the extrapolation of translation to unseen shapes and scales, and that an iterative training scheme achieves significant extrapolation of rotation in time.