Hanna Drimalla

LG
h-index8
5papers
18citations
Novelty24%
AI Score38

5 Papers

LGJun 15, 2023
Towards Interpretability in Audio and Visual Affective Machine Learning: A Review

David S. Johnson, Olya Hakobyan, Hanna Drimalla

Machine learning is frequently used in affective computing, but presents challenges due the opacity of state-of-the-art machine learning methods. Because of the impact affective machine learning systems may have on an individual's life, it is important that models be made transparent to detect and mitigate biased decision making. In this regard, affective machine learning could benefit from the recent advancements in explainable artificial intelligence (XAI) research. We perform a structured literature review to examine the use of interpretability in the context of affective machine learning. We focus on studies using audio, visual, or audiovisual data for model training and identified 29 research articles. Our findings show an emergence of the use of interpretability methods in the last five years. However, their use is currently limited regarding the range of methods used, the depth of evaluations, and the consideration of use-cases. We outline the main gaps in the research and provide recommendations for researchers that aim to implement interpretable methods for affective machine learning.

57.2SIMay 5
Sorry for the late reply: Response times and reciprocity in WhatsApp and Instagram chats

Florian Martin, Olya Hakobyan, Hanna Drimalla

Chat communication is often fast-paced, creating the expectation of quick replies. While the timing of exchanges is known to foster closeness and enjoyment, it remains largely unexplored whether chat partners with strong ties reciprocate each other's response times. Using 3.4 million messages from 889 chats across 97 donations of anonymous WhatsApp and Instagram chats, we analyzed response times, their balance between chat partners, and its stability over time. To our knowledge, this is the first study to examine response speed as an expression of reciprocity, bridging a key aspect of online communication with a fundamental principle of social interactions. We found that around 70% of WhatsApp and 44% of Instagram messages were answered within five minutes, confirming the fast pace of instant messaging. Overall, the response speed between chat partners was similar. The response speed similarity was evident both in the overall response-time distributions of chat partners assessed with Jensen-Shannon distance and in the steep regression slopes (0.786 for WhatsApp and 0.796 for Instagram) linking one person's probability of responding within five minutes to the partner's corresponding probability. Importantly, the dispersion of response time similarity over months showed that this balance persists over time. Our results position response time balance as a marker of reciprocity in computer-mediated communication, offering a new way to quantitatively study this fundamental principle of social interaction. We suggest using response speed balance as a complementary metric in the analysis of relationship dynamics, such as the strengthening or weakening of social ties.

LGMar 11, 2025
Generalization of Video-Based Heart Rate Estimation Methods To Low Illumination and Elevated Heart Rates

Bhargav Acharya, William Saakyan, Barbara Hammer et al.

Heart rate is a physiological signal that provides information about an individual's health and affective state. Remote photoplethysmography (rPPG) allows the estimation of this signal from video recordings of a person's face. Classical rPPG methods make use of signal processing techniques, while recent rPPG methods utilize deep learning networks. Methods are typically evaluated on datasets collected in well-lit environments with participants at resting heart rates. However, little investigation has been done on how well these methods adapt to variations in illumination and heart rate. In this work, we systematically evaluate representative state-of-the-art methods for remote heart rate estimation. Specifically, we evaluate four classical methods and four deep learning-based rPPG estimation methods in terms of their generalization ability to changing scenarios, including low lighting conditions and elevated heart rates. For a thorough evaluation of existing approaches, we collected a novel dataset called CHILL, which systematically varies heart rate and lighting conditions. The dataset consists of recordings from 45 participants in four different scenarios. The video data was collected under two different lighting conditions (high and low) and normal and elevated heart rates. In addition, we selected two public datasets to conduct within- and cross-dataset evaluations of the rPPG methods. Our experimental results indicate that classical methods are not significantly impacted by low-light conditions. Meanwhile, some deep learning methods were found to be more robust to changes in lighting conditions but encountered challenges in estimating high heart rates. The cross-dataset evaluation revealed that the selected deep learning methods underperformed when influencing factors such as elevated heart rates and low lighting conditions were not present in the training set.

7.6HCApr 1
Video-based Social Interaction Behavior Analysis with the Simulated Interaction Task for Children (Kids-SIT)

Rituja Pardhi, Matthias Norden, William Saakyan et al.

Accurately quantifying children's social interaction behavior is part of understanding their cognitive and emotional development, as well as mental health conditions. Kids-SIT is a web-based tool designed to computationally analyze children's behaviors by engaging them in a standardized video conversation scenario while their responses are video recorded. In a pre-registered study with 21 healthy children, we evaluated the potential of the Kids-SIT as an accessible paradigm for automated analysis of children's social interaction behavior. We assessed their subjective impression, as well as verbal and non-verbal responses during the Kids-SIT. Verbal content was analyzed using the LIWC tool. Three socially relevant non-verbal behaviors (gaze deviation, smiling, and nodding) were manually annotated and automatically extracted using three computational methods. We examined how well these methods capture naturalistic social interaction patterns of healthy children. We conducted an exploratory classification of healthy children (n=21) and those with social anxiety disorder (n=11) using automated behavioral features. The semantic analysis of the children's verbal responses and their post-hoc impressions indicated that the Kids-SIT successfully elicited natural social interaction behavior. Children's non-verbal behavior also showed similar pattern: they looked at their interaction partner for most of the time, particularly while listening than speaking. Smiling and gazing toward the partner occurred more frequently during the person-directed liked and disliked parts than during the picture-description phase. These non-verbal behavior patterns were captured both by manual annotations and by the computational analysis methods. In the exploratory analysis with a clinical sample, automatically extracted features enabled above-chance differentiation between children with and without SAD (AUC=0.74).

CVSep 19, 2025
Improving Autism Detection with Multimodal Behavioral Analysis

William Saakyan, Matthias Norden, Lola Eversmann et al.

Due to the complex and resource-intensive nature of diagnosing Autism Spectrum Condition (ASC), several computer-aided diagnostic support methods have been proposed to detect autism by analyzing behavioral cues in patient video data. While these models show promising results on some datasets, they struggle with poor gaze feature performance and lack of real-world generalizability. To tackle these challenges, we analyze a standardized video dataset comprising 168 participants with ASC (46% female) and 157 non-autistic participants (46% female), making it, to our knowledge, the largest and most balanced dataset available. We conduct a multimodal analysis of facial expressions, voice prosody, head motion, heart rate variability (HRV), and gaze behavior. To address the limitations of prior gaze models, we introduce novel statistical descriptors that quantify variability in eye gaze angles, improving gaze-based classification accuracy from 64% to 69% and aligning computational findings with clinical research on gaze aversion in ASC. Using late fusion, we achieve a classification accuracy of 74%, demonstrating the effectiveness of integrating behavioral markers across multiple modalities. Our findings highlight the potential for scalable, video-based screening tools to support autism assessment.