LaScA: Language-Conditioned Scalable Modelling of Affective Dynamics
This work addresses the problem of interpretable affect modeling for human-centered AI, offering a transparent alternative to black-box methods, though it appears incremental as it builds on existing handcrafted features and language models.
The paper tackled the challenge of predicting affect in unconstrained environments by proposing a framework that uses language models to condition handcrafted affect descriptors, resulting in consistent accuracy improvements for Valence and Arousal prediction on datasets like Aff-Wild2 and SEWA compared to baselines.
Predicting affect in unconstrained environments remains a fundamental challenge in human-centered AI. While deep neural embeddings dominate contemporary approaches, they often lack interpretability and limit expert-driven refinement. We propose a novel framework that uses Language Models (LMs) as semantic context conditioners over handcrafted affect descriptors to model changes in Valence and Arousal. Our approach begins with interpretable facial geometry and acoustic features derived from structured domain knowledge. These features are transformed into symbolic natural-language descriptions encoding their affective implications. A pretrained LM processes these descriptions to generate semantic context embeddings that act as high-level priors over affective dynamics. Unlike end-to-end black-box pipelines, our framework preserves feature transparency while leveraging the contextual abstraction capabilities of LMs. We evaluate the proposed method on the Aff-Wild2 and SEWA datasets for affect change prediction. Experimental results show consistent improvements in accuracy for both Valence and Arousal compared to handcrafted-only and deep-embedding baselines. Our findings demonstrate that semantic conditioning enables interpretable affect modelling without sacrificing predictive performance, offering a transparent and computationally efficient alternative to fully end-to-end architectures