John Quarles

HC
h-index22
10papers
88citations
Novelty43%
AI Score48

10 Papers

HCFeb 5, 2023
LiteVR: Interpretable and Lightweight Cybersickness Detection using Explainable AI

Ripan Kumar Kundu, Rifatul Islam, John Quarles et al.

Cybersickness is a common ailment associated with virtual reality (VR) user experiences. Several automated methods exist based on machine learning (ML) and deep learning (DL) to detect cybersickness. However, most of these cybersickness detection methods are perceived as computationally intensive and black-box methods. Thus, those techniques are neither trustworthy nor practical for deploying on standalone energy-constrained VR head-mounted devices (HMDs). In this work, we present an explainable artificial intelligence (XAI)-based framework, LiteVR, for cybersickness detection, explaining the model's outcome and reducing the feature dimensions and overall computational costs. First, we develop three cybersickness DL models based on long-term short-term memory (LSTM), gated recurrent unit (GRU), and multilayer perceptron (MLP). Then, we employed a post-hoc explanation, such as SHapley Additive Explanations (SHAP), to explain the results and extract the most dominant features of cybersickness. Finally, we retrain the DL models with the reduced number of features. Our results show that eye-tracking features are the most dominant for cybersickness detection. Furthermore, based on the XAI-based feature ranking and dimensionality reduction, we significantly reduce the model's size by up to 4.3x, training time by up to 5.6x, and its inference time by up to 3.8x, with higher cybersickness detection accuracy and low regression error (i.e., on Fast Motion Scale (FMS)). Our proposed lite LSTM model obtained an accuracy of 94% in classifying cybersickness and regressing (i.e., FMS 1-10) with a Root Mean Square Error (RMSE) of 0.30, which outperforms the state-of-the-art. Our proposed LiteVR framework can help researchers and practitioners analyze, detect, and deploy their DL-based cybersickness detection models in standalone VR HMDs.

25.6CVMar 26Code
AG-EgoPose: Leveraging Action-Guided Motion and Kinematic Joint Encoding for Egocentric 3D Pose Estimation

Md Mushfiqur Azam, John Quarles, Kevin Desai

Egocentric 3D human pose estimation remains challenging due to severe perspective distortion, limited body visibility, and complex camera motion inherent in first-person viewpoints. Existing methods typically rely on single-frame analysis or limited temporal fusion, which fails to effectively leverage the rich motion context available in egocentric videos. We introduce AG-EgoPose, a novel dual-stream framework that integrates short- and long-range motion context with fine-grained spatial cues for robust pose estimation from fisheye camera input. Our framework features two parallel streams: A spatial stream uses a weight-sharing ResNet-18 encoder-decoder to generate 2D joint heatmaps and corresponding joint-specific spatial feature tokens. Simultaneously, a temporal stream uses a ResNet-50 backbone to extract visual features, which are then processed by an action recognition backbone to capture the motion dynamics. These complementary representations are fused and refined in a transformer decoder with learnable joint tokens, which allows for the joint-level integration of spatial and temporal evidence while maintaining anatomical constraints. Experiments on real-world datasets demonstrate that AG-EgoPose achieves state-of-the-art performance in both quantitative and qualitative metrics. Code is available at: https://github.com/Mushfiq5647/AG-EgoPose.

HCSep 10, 2024
Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR

Jyotirmay Nag Setu, Joshua M Le, Ripan Kumar Kundu et al.

Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpredictable for developers. Researchers have previously provided labeled datasets for predicting cybersickness while users are stationary, but there have been few labeled datasets on cybersickness while users are physically walking. Thus, from 39 participants, we collected head orientation, head position, eye tracking, images, physiological readings from external sensors, and the self-reported cybersickness severity, physical load, and mental load in VR. Throughout the data collection, participants navigated mazes via real walking and performed tasks challenging their attention and working memory. To demonstrate the dataset's utility, we conducted a case study of training classifiers in which we achieved 95% accuracy for cybersickness severity classification. The noteworthy performance of the straightforward classifiers makes this dataset ideal for future researchers to develop cybersickness detection and reduction models. To better understand the features that helped with classification, we performed SHAP(SHapley Additive exPlanations) analysis, highlighting the importance of eye tracking and physiological measures for cybersickness prediction while walking. This open dataset can allow future researchers to study the connection between cybersickness and cognitive loads and develop prediction models. This dataset will empower future VR developers to design efficient and effective Virtual Environments by improving cognitive load management and minimizing cybersickness.

24.0LGMay 21
MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

Amir Mousavi, Mohammad Sadegh Sirjani, Erfan Nourbakhsh et al.

Real-time cognitive load assessment from eye-tracking signals could potentially enable adaptive human-centered-AI such as safety-critical applications such as driver vigilance monitoring or automated flight deck assistance, yet two challenges persist: handling frequent data missingness from blinks and tracking failures, and efficiently modeling long-range temporal dependencies. We propose MambaGaze, a framework that addresses these challenges through 1) XMD encoding, which augments raw features with observation masks and time-deltas to explicitly model data uncertainty, and 2) bidirectional Mamba-2, which captures temporal dependencies with linear computational complexity. Experiments on CLARE and CL-Drive datasets under leave-one-subject-out evaluation show that MambaGaze achieves 76.8% and 73.1% accuracy, respectively, outperforming CNN, Transformer, ResNet, and VGG baselines by 4-12 percentage points. Edge deployment benchmarks on NVIDIA Jetson platforms demonstrate real-time inference at 43-68 FPS with power consumption below 7.5W, confirming feasibility for wearable cognitive load monitoring.

35.8LGMay 21
CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation

Amir Mousavi, Mohammad Sadegh Sirjani, Erfan Nourbakhsh et al.

Real-time cognitive load assessment is essential for adaptive human-computer interaction but remains challenging due to limited labeled data and poor cross-subject generalization. Recent ECG foundation models pre-trained on millions of clinical recordings offer rich representations, but cannot be directly applied to wearable devices due to sensor configuration mismatch and task differences. In this paper, we propose CogAdapt, a framework that adapts clinical ECG foundation models to wearable cognitive load assessment. CogAdapt introduces LeadBridge, a learnable adapter that transforms 3-lead wearable signals into anatomically consistent 12-lead representations, and ProFine, a progressive fine-tuning strategy that gradually unfreezes encoder layers while preventing catastrophic forgetting. Evaluations on two public datasets (CLARE and CL-Drive) under leave-one-subject-out cross-validation show that CogAdapt substantially outperforms baselines trained from scratch, achieving macro-F1 scores of 0.626 and 0.768. These results demonstrate the promise of foundation model adaptation for subject-independent cognitive load assessment from wearable sensors.

CVOct 12, 2025
Towards Cybersickness Severity Classification from VR Gameplay Videos Using Transfer Learning and Temporal Modeling

Jyotirmay Nag Setu, Kevin Desai, John Quarles

With the rapid advancement of virtual reality (VR) technology, its adoption across domains such as healthcare, education, and entertainment has grown significantly. However, the persistent issue of cybersickness, marked by symptoms resembling motion sickness, continues to hinder widespread acceptance of VR. While recent research has explored multimodal deep learning approaches leveraging data from integrated VR sensors like eye and head tracking, there remains limited investigation into the use of video-based features for predicting cybersickness. In this study, we address this gap by utilizing transfer learning to extract high-level visual features from VR gameplay videos using the InceptionV3 model pretrained on the ImageNet dataset. These features are then passed to a Long Short-Term Memory (LSTM) network to capture the temporal dynamics of the VR experience and predict cybersickness severity over time. Our approach effectively leverages the time-series nature of video data, achieving a 68.4% classification accuracy for cybersickness severity. This surpasses the performance of existing models trained solely on video data, providing a practical tool for VR developers to evaluate and mitigate cybersickness in virtual environments. Furthermore, this work lays the foundation for future research on video-based temporal modeling for enhancing user comfort in VR applications.

CVNov 4, 2024
PMPNet: Pixel Movement Prediction Network for Monocular Depth Estimation in Dynamic Scenes

Kebin Peng, John Quarles, Kevin Desai

In this paper, we propose a novel method for monocular depth estimation in dynamic scenes. We first explore the arbitrariness of object's movement trajectory in dynamic scenes theoretically. To overcome the arbitrariness, we use assume that points move along a straight line over short distances and then summarize it as a triangular constraint loss in two dimensional Euclidean space. To overcome the depth inconsistency problem around the edges, we propose a deformable support window module that learns features from different shapes of objects, making depth value more accurate around edge area. The proposed model is trained and tested on two outdoor datasets - KITTI and Make3D, as well as an indoor dataset - NYU Depth V2. The quantitative and qualitative results reported on these datasets demonstrate the success of our proposed model when compared against other approaches. Ablation study results on the KITTI dataset also validate the effectiveness of the proposed pixel movement prediction module as well as the deformable support window module.

HCFeb 9, 2022
Auditory Feedback for Standing Balance Improvement in Virtual Reality

M. Rasel Mahmud, Michael Stewart, Alberto Cordova et al.

Virtual Reality (VR) users often experience postural instability, i.e., balance problems, which could be a major barrier to universal usability and accessibility for all, especially for persons with balance impairments. Prior research has confirmed the imbalance effect, but minimal research has been conducted to reduce this effect. We recruited 42 participants (with balance impairments: 21, without balance impairments: 21) to investigate the impact of several auditory techniques on balance in VR, specifically spatial audio, static rest frame audio, rhythmic audio, and audio mapped to the center of pressure (CoP). Participants performed two types of tasks - standing visual exploration and standing reach and grasp. Within-subject results showed that each auditory technique improved balance in VR for both persons with and without balance impairments. Spatial and CoP audio improved balance significantly more than other auditory conditions. The techniques presented in this research could be used in future virtual environments to improve standing balance and help push VR closer to universal usability.

HCOct 1, 2021
DiVRsify: Break the Cycle and Develop VR for Everyone

Tabitha C. Peck, Kyla McMullen, John Quarles

Virtual reality technology is biased. It excludes approximately 95% the world's population by being primarily designed for male, western, educated, industrial, rich, and democratic populations. This bias may be due to the lack of diversity in virtual reality researchers, research participants, developers, and end users, fueling a noninclusive research, development, and usability cycle. The objective of this paper is to highlight the minimal virtual reality research involving understudied populations with respect to dimensions of diversity, such as gender, race, culture, ethnicity, age, disability, and neurodivergence. Specifically, we highlight numerous differences in virtual reality usability between underrepresented groups compared to commonly studied populations. These differences illustrate the lack of generalizability of prior virtual reality research. Lastly, we present a call to action with the aim that, over time, will break the cycle and enable virtual reality for everyone.

HCAug 14, 2021
VR Sickness Prediction from Integrated HMD's Sensors using Multimodal Deep Fusion Network

Rifatul Islam, Kevin Desai, John Quarles

Virtual Reality (VR) sickness commonly known as cybersickness is one of the major problems for the comfortable use of VR systems. Researchers have proposed different approaches for predicting cybersickness from bio-physiological data (e.g., heart rate, breathing rate, electroencephalogram). However, collecting bio-physiological data often requires external sensors, limiting locomotion and 3D-object manipulation during the virtual reality (VR) experience. Limited research has been done to predict cybersickness from the data readily available from the integrated sensors in head-mounted displays (HMDs) (e.g., head-tracking, eye-tracking, motion features), allowing free locomotion and 3D-object manipulation. This research proposes a novel deep fusion network to predict cybersickness severity from heterogeneous data readily available from the integrated HMD sensors. We extracted 1755 stereoscopic videos, eye-tracking, and head-tracking data along with the corresponding self-reported cybersickness severity collected from 30 participants during their VR gameplay. We applied several deep fusion approaches with the heterogeneous data collected from the participants. Our results suggest that cybersickness can be predicted with an accuracy of 87.77\% and a root-mean-square error of 0.51 when using only eye-tracking and head-tracking data. We concluded that eye-tracking and head-tracking data are well suited for a standalone cybersickness prediction framework.