CVJun 15, 2018

The Toybox Dataset of Egocentric Visual Object Transformations

arXiv:1806.06034v312 citations
Originality Synthesis-oriented
AI Analysis

This dataset addresses a gap for researchers in object recognition by providing more naturalistic, multi-view data, but it is incremental as it builds on existing multi-view datasets.

The authors tackled the problem of limited object instance and view distributions in existing datasets by introducing Toybox, a new video dataset of egocentric visual object transformations, and demonstrated its use through neural network experiments showing effects on recognition performance and viewpoint-dependent representations.

In object recognition research, many commonly used datasets (e.g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e.g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles. These distributional properties constrain the types of computational experiments that are able to be conducted with such datasets, and also do not reflect naturalistic patterns of embodied visual experience. As a contribution to the small (but growing) number of multi-view object datasets that have been created to bridge this gap, we introduce a new video dataset called Toybox that contains egocentric (i.e., first-person perspective) videos of common household objects and toys being manually manipulated to undergo structured transformations, such as rotation, translation, and zooming. To illustrate potential uses of Toybox, we also present initial neural network experiments that examine 1) how training on different distributions of object instances and views affects recognition performance, and 2) how viewpoint-dependent object concepts are represented within the hidden layers of a trained network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes