CV LGSep 21, 2016

PixelNet: Towards a General Pixel-level Architecture

Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

arXiv:1609.06694v116.763 citations

Originality Highly original

AI Analysis

It addresses the need for a versatile architecture in computer vision that can handle multiple pixel-level tasks efficiently, offering a novel approach beyond task-specific models.

The paper tackles the problem of designing a general pixel-level prediction architecture for tasks like edge detection, surface normal estimation, and semantic segmentation, achieving state-of-the-art results on benchmarks such as PASCAL-Context, NYUDv2, and BSDS without post-processing.

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that (1) stratified sampling allows us to add diversity during batch updates and (2) sampled multi-scale features allow us to explore more nonlinear predictors (multiple fully-connected layers followed by ReLU) that improve overall accuracy. Finally, our objective is to show how a architecture can get performance better than (or comparable to) the architectures designed for a particular task. Interestingly, our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context, surface normal estimation on NYUDv2 dataset, and edge detection on BSDS without contextual post-processing.

View on arXiv PDF

Similar