CVApr 22, 2025

GADS: A Super Lightweight Model for Head Pose Estimation

arXiv:2504.15751v12 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the need for efficient head pose estimation in resource-constrained environments like edge devices, representing an incremental improvement in model efficiency.

The authors tackled the problem of head pose estimation for edge devices by proposing GADS, a lightweight model that is 7.5x smaller and 25x faster than the current lightest state-of-the-art model.

In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes