CVMar 13, 2021

An Efficient Multitask Neural Network for Face Alignment, Head Pose Estimation and Face Tracking

arXiv:2103.07615v326 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in face analysis for mobile and real-time applications, but it is incremental as it builds on existing multitask and lightweight network approaches.

The paper tackles the challenge of maintaining accuracy and efficiency in face-related algorithms for mobile applications by proposing ATPN, an efficient multitask neural network for face alignment, head pose estimation, and face tracking, which achieves better performance with fewer parameters and lower computational complexity on benchmark datasets.

While Convolutional Neural Networks (CNNs) have significantly boosted the performance of face related algorithms, maintaining accuracy and efficiency simultaneously in practical use remains challenging. The state-of-the-art methods employ deeper networks for better performance, which makes it less practical for mobile applications because of more parameters and higher computational complexity. Therefore, we propose an efficient multitask neural network, Alignment & Tracking & Pose Network (ATPN) for face alignment, face tracking and head pose estimation. Specifically, to achieve better performance with fewer layers for face alignment, we introduce a shortcut connection between shallow-layer and deep-layer features. We find the shallow-layer features are highly correspond to facial boundaries that can provide the structural information of face and it is crucial for face alignment. Moreover, we generate a cheap heatmap based on the face alignment result and fuse it with features to improve the performance of the other two tasks. Based on the heatmap, the network can utilize both geometric information of landmarks and appearance information for head pose estimation. The heatmap also provides attention clues for face tracking. The face tracking task also saves us the face detection procedure for each frame, which also significantly boost the real-time capability for video-based tasks. We experimentally validate ATPN on four benchmark datasets, WFLW, 300VW, WIDER Face and 300W-LP. The experimental results demonstrate that it achieves better performance with much less parameters and lower computational complexity compared to other light models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes