Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation
This addresses real-time interactive applications like games, VR, and AR on smartphones, but is incremental as it builds on existing CNN and RNN methods with optimisations.
The paper tackled low accuracy in smartphone-based eye tracking for video stimuli and computational constraints by developing CNN+LSTM and CNN+GRU models, achieving average RMSEs of 0.955 cm and 1.091 cm, and using model quantisation to reduce inference time by 21.72% and 19.50% on edge devices.
A significant limitation of current smartphone-based eye-tracking algorithms is their low accuracy when applied to video-type visual stimuli, as they are typically trained on static images. Also, the increasing demand for real-time interactive applications like games, VR, and AR on smartphones requires overcoming the limitations posed by resource constraints such as limited computational power, battery life, and network bandwidth. Therefore, we developed two new smartphone eye-tracking techniques for video-type visuals by combining Convolutional Neural Networks (CNN) with two different Recurrent Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean Square Error of 0.955 cm and 1.091 cm, respectively. To address the computational constraints of smartphones, we developed an edge intelligence architecture to enhance the performance of smartphone-based eye tracking. We applied various optimisation methods like quantisation and pruning to deep learning models for better energy, CPU, and memory usage on edge devices, focusing on real-time processing. Using model quantisation, the model inference time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge devices.