CVAIHCLGSPAug 7, 2024

FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets

arXiv:2408.03591v22 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the scalability and usability limitations in extended reality, robotics, and human-computer interaction by eliminating calibration needs.

The paper tackled the problem of fixation depth estimation without user-specific calibration by introducing FOVAL, which achieved a mean absolute error of 9.1 cm across diverse datasets and demonstrated strong generalization.

Accurate fixation depth estimation is essential for applications in extended reality (XR), robotics, and human-computer interaction. However, current methods heavily depend on user-specific calibration, which limits their scalability and usability. We introduce FOVAL, a robust calibration-free approach that combines spatiotemporal sequence modelling via Long Short-Term Memory (LSTM) networks with subject-invariant feature engineering and normalisation. Compared to Transformers, Temporal Convolutional Networks (TCNs), and CNNs, FOVAL achieves superior performance, particularly in scenarios with limited and noisy gaze data. Evaluations across three benchmark datasets using Leave-One-Out Cross-Validation (LOOCV) and cross-dataset validation show a mean absolute error (MAE) of 9.1 cm and strong generalisation without calibration. We further analyse inter-subject variability and domain shifts, providing insight into model robustness and adaptation. FOVAL's scalability and accuracy make it highly suitable for real-world deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes