CVJun 12, 2013

Recurrent Convolutional Neural Networks for Scene Parsing

arXiv:1306.2795v1153 citations
Originality Incremental advance
AI Analysis

This addresses scene parsing for computer vision applications, offering an incremental improvement in efficiency and accuracy.

The authors tackled scene parsing by proposing a recurrent convolutional neural network that captures long-range dependencies without segmentation methods or task-specific features, achieving state-of-the-art performance on the Stanford Background and SIFT Flow datasets while remaining fast at test time.

Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to. To ensure a good visual coherence and a high class accuracy, it is essential for a scene parser to capture image long range dependencies. In a feed-forward architecture, this can be simply achieved by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach consisting of a recurrent convolutional neural network which allows us to consider a large input context, while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation methods, nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes