LocalEyenet: Deep Attention framework for Localization of Eyes
This work addresses a domain-specific problem for human-machine interfaces by improving facial landmark detection for gaze-driven systems, though it appears incremental as it builds on existing methods like stacked hourglass backbones.
The paper tackles the problem of eye region localization for gaze detection by proposing LocalEyenet, a deep coarse-to-fine architecture with self-attention and deep layer aggregation, which shows good generalization in cross-dataset evaluation and real-time performance.
Development of human machine interface has become a necessity for modern day machines to catalyze more autonomy and more efficiency. Gaze driven human intervention is an effective and convenient option for creating an interface to alleviate human errors. Facial landmark detection is very crucial for designing a robust gaze detection system. Regression based methods capacitate good spatial localization of the landmarks corresponding to different parts of the faces. But there are still scope of improvements which have been addressed by incorporating attention. In this paper, we have proposed a deep coarse-to-fine architecture called LocalEyenet for localization of only the eye regions that can be trained end-to-end. The model architecture, build on stacked hourglass backbone, learns the self-attention in feature maps which aids in preserving global as well as local spatial dependencies in face image. We have incorporated deep layer aggregation in each hourglass to minimize the loss of attention over the depth of architecture. Our model shows good generalization ability in cross-dataset evaluation and in real-time localization of eyes.