"Mental Rotation" by Optimizing Transforming Distance
This addresses the challenge of invariance in recognition systems for computer vision, offering a novel approach inspired by human mental rotation, though it appears incremental as it builds on existing relational models.
The paper tackles the problem of recognizing objects under transformations by proposing a transforming distance that actively transforms pairs of examples to maximize similarity while respecting learned constraints, achieving improved nearest-neighbor performance on datasets like the Toronto Face Database and NORB.
The human visual system is able to recognize objects despite transformations that can drastically alter their appearance. To this end, much effort has been devoted to the invariance properties of recognition systems. Invariance can be engineered (e.g. convolutional nets), or learned from data explicitly (e.g. temporal coherence) or implicitly (e.g. by data augmentation). One idea that has not, to date, been explored is the integration of latent variables which permit a search over a learned space of transformations. Motivated by evidence that people mentally simulate transformations in space while comparing examples, so-called "mental rotation", we propose a transforming distance. Here, a trained relational model actively transforms pairs of examples so that they are maximally similar in some feature space yet respect the learned transformational constraints. We apply our method to nearest-neighbour problems on the Toronto Face Database and NORB.