CVAILGSep 21, 2021

LOTR: Face Landmark Localization Using Localization Transformer

arXiv:2109.10057v318 citations
Originality Highly original
AI Analysis

This work addresses the problem of accurate facial landmark localization for computer vision applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles facial landmark localization by proposing LOTR, a Transformer-based direct coordinate regression network with a smooth-Wing loss, achieving superior results on the JD landmark dataset and promising performance on the WFLW dataset compared to existing methods.

This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize the spatial information in the feature map. An LOTR model consists of three main modules: 1) a visual backbone that converts an input image into a feature map, 2) a Transformer module that improves the feature representation from the visual backbone, and 3) a landmark prediction head that directly predicts the landmark coordinates from the Transformer's representation. Given cropped-and-aligned face images, the proposed LOTR can be trained end-to-end without requiring any post-processing steps. This paper also introduces the smooth-Wing loss function, which addresses the gradient discontinuity of the Wing loss, leading to better convergence than standard loss functions such as L1, L2, and Wing loss. Experimental results on the JD landmark dataset provided by the First Grand Challenge of 106-Point Facial Landmark Localization indicate the superiority of LOTR over the existing methods on the leaderboard and two recent heatmap-based approaches. On the WFLW dataset, the proposed LOTR framework demonstrates promising results compared with several state-of-the-art methods. Additionally, we report the improvement in state-of-the-art face recognition performance when using our proposed LOTRs for face alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes