CVRONov 10, 2020

A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs

arXiv:2011.05459v31 citations
AI Analysis

This work addresses the challenge of unsupervised object detection for robotics and computer vision applications, but it is incremental as it builds on existing self-supervised and clustering techniques.

The paper tackles the problem of detecting novel object categories in videos by proposing a self-supervised system that uses random walks on graphs to sample triplets for training a siamese network, resulting in improved clustering accuracy on three public datasets.

This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes