FlatteNet: A Simple Versatile Framework for Dense Pixelwise Prediction
This provides a versatile and efficient alternative framework for tasks like human pose estimation, semantic segmentation, and object detection, though it is incremental as it builds on existing FCNs.
The paper tackles the challenge of reduced feature resolution in Fully Convolutional Networks for dense pixelwise prediction by introducing a lightweight Flattening Module, achieving competitive results on benchmarks like MPII, PASCAL-Context, and PASCAL VOC.
In this paper, we focus on devising a versatile framework for dense pixelwise prediction whose goal is to assign a discrete or continuous label to each pixel for an image. It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce the Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. In addition, the Flattening Module is lightweight and can be easily combined with any existing FCNs, allowing the model builder to trade off among model size, computational cost and accuracy by simply choosing different backbone networks. We empirically demonstrate the effectiveness of the proposed Flattening Module through competitive results in human pose estimation on MPII, semantic segmentation on PASCAL-Context and object detection on PASCAL VOC. We hope that the proposed approach can serve as a simple and strong alternative of current dominant dense pixelwise prediction frameworks.