CVSep 4, 2015

Object Recognition from Short Videos for Robotic Perception

arXiv:1509.01602v18 citations
Originality Incremental advance
AI Analysis

This work addresses robotic perception by improving object recognition accuracy in videos, though it is incremental as it builds on existing LSTM methods.

The paper tackled object recognition in short videos by using convolutional LSTM networks to leverage motion dependencies, achieving new state-of-the-art results on the Washington RGBD datasets.

Deep neural networks have become the primary learning technique for object recognition. Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial. Here, we investigate how motion can aid object recognition in short videos. Our approach is based on Long Short-Term Memory (LSTM) deep networks. Unlike previous applications of LSTMs, we implement each gate as a convolution. We show that convolutional-based LSTM models are capable of learning motion dependencies and are able to improve the recognition accuracy when more frames in a sequence are available. We evaluate our approach on the Washington RGBD Object dataset and on the Washington RGBD Scenes dataset. Our approach outperforms deep nets applied to still images and sets a new state-of-the-art in this domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes