Ali Kargarandehkordi

CV
h-index13
4papers
14citations
Novelty23%
AI Score33

4 Papers

LGMay 29
Adaptive data selection improves wearable prediction under low baseline performance

Ali Kargarandehkordi

Adaptive sensing strategies that selectively sample data are increasingly used in wearable health systems to improve prediction performance under limited data budgets, yet their benefits across individuals remain poorly understood. Here, we evaluate adaptive selection of time windows for model training under fixed measurement budgets across multiple sensing modalities, including heart rate, activity, and ecological momentary assessment (EMA), in a longitudinal wearable dataset. We quantify performance gains relative to random sampling using both area under the receiver operating characteristic curve (AUROC) and F1 score. Adaptive strategies yield substantial improvements in AUROC for participants with low baseline performance (with gains up to 0.7), while offering limited or negative gains for participants with strong baselines. Across modalities, adaptive gain is strongly inversely correlated with baseline performance (Pearson r = -0.67; Spearman p = -0.62). At the participant level, most individuals benefit in AUROC (60-80% across modalities), although improvements in F1 are smaller and less consistent. These findings show that adaptive sensing is not uniformly beneficial, but instead provides the greatest value in underperforming settings. Our results support selective deployment strategies that tailor adaptive sensing based on baseline performance to improve efficiency in wearable health monitoring.

CVMar 19, 2023
Computer Vision Estimation of Emotion Reaction Intensity in the Wild

Yang Qian, Ali Kargarandehkordi, Onur Cezmi Mutlu et al.

Emotions play an essential role in human communication. Developing computer vision models for automatic recognition of emotion expression can aid in a variety of domains, including robotics, digital behavioral healthcare, and media analytics. There are three types of emotional representations which are traditionally modeled in affective computing research: Action Units, Valence Arousal (VA), and Categorical Emotions. As part of an effort to move beyond these representations towards more fine-grained labels, we describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge in the 5th competition for Affective Behavior Analysis in-the-Wild (ABAW). We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity. Our best performing model on the Hume-Reaction dataset achieved an average Pearson correlation coefficient of 0.4080 on the test set using a pre-trained ResNet50 model. This work provides a first step towards the development of production-grade models which predict emotion reaction intensities rather than discrete emotion categories.

CVSep 21, 2023
Personalization of Affective Models to Enable Neuropsychiatric Digital Precision Health Interventions: A Feasibility Study

Ali Kargarandehkordi, Matti Kaisti, Peter Washington

Mobile digital therapeutics for autism spectrum disorder (ASD) often target emotion recognition and evocation, which is a challenge for children with ASD. While such mobile applications often use computer vision machine learning (ML) models to guide the adaptive nature of the digital intervention, a single model is usually deployed and applied to all children. Here, we explore the potential of model personalization, or training a single emotion recognition model per person, to improve the performance of these underlying emotion recognition models used to guide digital health therapies for children with ASD. We conducted experiments on the Emognition dataset, a video dataset of human subjects evoking a series of emotions. For a subset of 10 individuals in the dataset with a sufficient representation of at least two ground truth emotion labels, we trained a personalized version of three classical ML models on a set of 51 features extracted from each video frame. We measured the importance of each facial feature for all personalized models and observed differing ranked lists of top features across subjects, motivating the need for model personalization. We then compared the personalized models against a generalized model trained using data from all 10 participants. The mean F1-scores achieved by the personalized models were 90.48%, 92.66%, and 86.40%, respectively. By contrast, the mean F1-scores reached by non-personalized models trained on different human subjects and evaluated using the same test set were 88.55%, 91.78%, and 80.42%, respectively. The personalized models outperformed the generalized models for 7 out of 10 participants. PCA analyses on the remaining 3 participants revealed relatively facial configuration differences between emotion labels within each subject, suggesting that personalized ML will fail when the variation among data points within a subjects data is too low.

CVFeb 14, 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos

Yang Qian, Yinan Sun, Ali Kargarandehkordi et al.

The increasing variety and quantity of tagged multimedia content on a variety of online platforms offer a unique opportunity to advance the field of human action recognition. In this study, we utilize 283,582 unique, unlabeled TikTok video clips, categorized into 386 hashtags, to train a domain-specific foundation model for action recognition. We employ VideoMAE V2, an advanced model integrating Masked Autoencoders (MAE) with Vision Transformers (ViT), pre-trained on this diverse collection of unstructured videos. Our model, fine-tuned on established action recognition benchmarks such as UCF101 and HMDB51, achieves state-of-the-art results: 99.05% on UCF101, 86.08% on HMDB51, 85.51% on Kinetics-400, and 74.27% on Something-Something V2 using the ViT-giant backbone. These results highlight the potential of using unstructured and unlabeled videos as a valuable source of diverse and dynamic content for training foundation models. Our investigation confirms that while initial increases in pre-training data volume significantly enhance model performance, the gains diminish as the dataset size continues to expand. Our findings emphasize two critical axioms in self-supervised learning for computer vision: (1) additional pre-training data can yield diminishing benefits for some datasets and (2) quality is more important than quantity in self-supervised learning, especially when building foundation models.