Fully Convolutional Neural Networks for Crowd Segmentation
This addresses crowd segmentation for computer vision applications, offering an incremental improvement by adapting existing FCNN ideas to this domain with new datasets.
The paper tackles crowd segmentation by proposing a fast fully convolutional neural network (FCNN) that replaces fully connected layers with 1x1 convolutions, enabling direct output of segmentation maps with translation invariance and lower computation cost, and it introduces a multi-stage deep learning approach integrating appearance and motion cues, evaluated on two new large datasets with 235 and 11 scenes.
In this paper, we propose a fast fully convolutional neural network (FCNN) for crowd segmentation. By replacing the fully connected layers in CNN with 1 by 1 convolution kernels, FCNN takes whole images as inputs and directly outputs segmentation maps by one pass of forward propagation. It has the property of translation invariance like patch-by-patch scanning but with much lower computation cost. Once FCNN is learned, it can process input images of any sizes without warping them to a standard size. These attractive properties make it extendable to other general image segmentation problems. Based on FCNN, a multi-stage deep learning is proposed to integrate appearance and motion cues for crowd segmentation. Both appearance filters and motion filers are pretrained stage-by-stage and then jointly optimized. Different combination methods are investigated. The effectiveness of our approach and component-wise analysis are evaluated on two crowd segmentation datasets created by us, which include image frames from 235 and 11 scenes, respectively. They are currently the largest crowd segmentation datasets and will be released to the public.