CVApr 14, 2018

Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy

Guanglu Song, Yu Liu, Ming Jiang, Yujie Wang, Junjie Yan, Biao Leng

arXiv:1804.05197v29.934 citations

Originality Incremental advance

AI Analysis

This work addresses the need for faster and more accurate face detection systems, which is incremental as it builds upon existing FCN-based methods like Faster-RCNN and SSD.

The paper tackles the problem of accelerating fully convolutional neural network (FCN)-based face detectors while improving accuracy, proposing a method that decomposes the search space into scale and spatial directions to reduce false alarms and computational redundancy. Experiments show that the method accelerates the RPN detector by about 4x and achieves state-of-the-art results on FDDB, AFW, and MALF benchmarks.

Fully convolutional neural network (FCN) has been dominating the game of face detection task for a few years with its congenital capability of sliding-window-searching with shared kernels, which boiled down all the redundant calculation, and most recent state-of-the-art methods such as Faster-RCNN, SSD, YOLO and FPN use FCN as their backbone. So here comes one question: Can we find a universal strategy to further accelerate FCN with higher accuracy, so could accelerate all the recent FCN-based methods? To analyze this, we decompose the face searching space into two orthogonal directions, `scale' and `spatial'. Only a few coordinates in the space expanded by the two base vectors indicate foreground. So if FCN could ignore most of the other points, the searching space and false alarm should be significantly boiled down. Based on this philosophy, a novel method named scale estimation and spatial attention proposal ($S^2AP$) is proposed to pay attention to some specific scales and valid locations in the image pyramid. Furthermore, we adopt a masked-convolution operation based on the attention result to accelerate FCN calculation. Experiments show that FCN-based method RPN can be accelerated by about $4\times$ with the help of $S^2AP$ and masked-FCN and at the same time it can also achieve the state-of-the-art on FDDB, AFW and MALF face detection benchmarks as well.

View on arXiv PDF

Similar