Jens Madsen

4papers

3citations

Novelty25%

AI Score37

Ranked #114,402 of 201,326 authors (top 57%)#36,767 in CV (top 62%)

4 Papers

CVMay 24

From Affect to Complex Behavior: Advancing Multimodal Human-Centered AI at the 10th ABAW Workshop & Competition

Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen et al.

The 10th Affective & Behavior Analysis in-the-Wild (ABAW) Workshop and Competition, held at CVPR 2026, continues to advance research on modelling, analysis, understanding of human affect and behavior in real-world, unconstrained environments. The workshop maintains its dual structure, comprising both a competition and a paper track. The ABAW Competition introduces a diverse set of challenges targeting key aspects of affective and behavioral understanding, including continuous affect (valence-arousal) estimation, discrete affect (expression and action unit) recognition, as well as more complex behavior analysis tasks, such as emotional mimicry intensity estimation, ambivalence/hesitancy recognition and fine-grained violence detection. These challenges are built upon large-scale in-the-wild datasets, providing comprehensive benchmarks for state-of-the-art approaches. In parallel, the paper track presents a wide range of contributions spanning pose, motion & behavior estimation, affect modelling & multimodal learning, benchmarks, datasets & evaluation protocols, fairness, robustness & deployment. Overall, the 10th ABAW Workshop and Competition continues to serve as a key platform for benchmarking, collaboration and innovation, shaping the development of next-generation multimodal, human-centered AI systems.

CVSep 19, 2024Code

Real-time estimation of overt attention from dynamic features of the face using deep-learning

Aimar Silvan Ortubay, Lucas C. Parra, Jens Madsen

Students often drift in and out of focus during class. Effective teachers recognize this and re-engage them when necessary. With the shift to remote learning, teachers have lost the visual feedback needed to adapt to varying student engagement. We propose using readily available front-facing video to infer attention levels based on movements of the eyes, head, and face. We train a deep learning model to predict a measure of attention based on overt eye movements. Specifically, we measure Inter-Subject Correlation of eye movements in ten-second intervals while students watch the same educational videos. In 3 different experiments (N=83) we show that the trained model predicts this objective metric of attention on unseen data with $R^2$=0.38, and on unseen subjects with $R^2$=0.26-0.30. The deep network relies mostly on a student's eye movements, but to some extent also on movements of the brows, cheeks, and head. In contrast to Inter-Subject Correlation of the eyes, the model can estimate attentional engagement from individual students' movements without needing reference data from an attentive group. This enables a much broader set of online applications. The solution is lightweight and can operate on the client side, which mitigates some of the privacy concerns associated with online attention monitoring. GitHub implementation is available at https://github.com/asortubay/timeISC

AIMay 4

The 2026 ACII Dyadic Conversations (DaiKon) Workshop & Challenge

Panagiotis Tzirakis, Alice Baird, Jeffrey Brooks et al.

The 2026 ACII Dyadic Conversations (ACII-DaiKon) Workshop & Challenge introduces a benchmark for modeling interpersonal affect and social dynamics in dyadic conversations. Although conversational affect modeling has advanced rapidly, most benchmarks remain speaker-centric and underrepresent coupled, time-evolving processes between partners, including directional influence, conversational timing coordination, and rapport development. To address this gap, ACII-DaiKon presents three coordinated sub-challenges built on a shared dataset: (1) directional interpersonal influence prediction, (2) turn-taking prediction (next-speaker and time-to-next-speech), and (3) rapport trajectory prediction across full interactions. The challenge is built on the Hume-DaiKon dataset, comprising 945 dyadic conversations (743.4 hours of audiovisual data) collected under naturalistic conditions across five languages. The benchmark supports multimodal modeling, temporal reasoning, and cross-context generalization through fixed train/validation/test splits, standardized metrics, and released baseline systems. Evaluation uses Concordance Correlation Coefficient (CCC), Pearson correlation, Macro-F1, and Mean Absolute Error (MAE) depending on the sub-challenge. Baseline experiments establish initial reference performance, with best test results of 0.40 CCC and 0.50 Pearson for influence prediction, 0.66 Macro-F1 and 1.50~s MAE for turn-taking, and 0.68 CCC and 0.70 Pearson for rapport trajectory modeling. These results indicate that while current methods capture coarse dyadic patterns, robust modeling of directional dependence and long-horizon interpersonal dynamics remains challenging. The workshop provides a shared platform for rigorous comparison and cross-disciplinary discussion on data validity, evaluation protocols, and culturally aware modeling for dyadic interaction.

CYDec 2, 2016

Predicting Changes in Affective States using Neural Networks

Stina Lyck Carstensen, Jens Madsen, Jan Larsen

Knowledge of patients affective state could prove to be crucial for health-care professionals in both diagnosis and treatment, however, this requires patients to report how they feel. In practice the sampling rate of affective states needs to be kept low, in order to ensure that the patients can rest. Furthermore using traditional methods of measuring affective states, is not always possible, e.g. patients can be incapable of verbal communications. In this study we explore the prediction of peoples self-reported affective state by measuring multiple physiological signals. We use different Neural networks (NN) setups and compare with different multiple linear regression (MLR) setups for prediction of changes in affective states. The results showed that NN and MLR predicted the change in affective states with accuracies of 91.88% and 89.10%, respectively.