CV MMFeb 14, 2025

Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling

arXiv:2502.10145v16.22 citationsh-index: 2

Originality Highly original

AI Analysis

This work addresses the need for interpretable and trustworthy AI systems in human-centered applications, particularly in affective computing, which is a critical component of responsible AI.

The authors tackled the problem of interpretable affective computing by proposing a novel framework called Attention-Guided Concept Model (AGCM), which provides learnable conceptual explanations and achieves efficient results on Facial Expression Recognition benchmark datasets. The framework also demonstrates generalizability on complex real-world human behavior understanding applications.

In the contemporary era of intelligent connectivity, Affective Computing (AC), which enables systems to recognize, interpret, and respond to human behavior states, has become an integrated part of many AI systems. As one of the most critical components of responsible AI and trustworthiness in all human-centered systems, explainability has been a major concern in AC. Particularly, the recently released EU General Data Protection Regulation requires any high-risk AI systems to be sufficiently interpretable, including biometric-based systems and emotion recognition systems widely used in the affective computing field. Existing explainable methods often compromise between interpretability and performance. Most of them focus only on highlighting key network parameters without offering meaningful, domain-specific explanations to the stakeholders. Additionally, they also face challenges in effectively co-learning and explaining insights from multimodal data sources. To address these limitations, we propose a novel and generalizable framework, namely the Attention-Guided Concept Model (AGCM), which provides learnable conceptual explanations by identifying what concepts that lead to the predictions and where they are observed. AGCM is extendable to any spatial and temporal signals through multimodal concept alignment and co-learning, empowering stakeholders with deeper insights into the model's decision-making process. We validate the efficiency of AGCM on well-established Facial Expression Recognition benchmark datasets while also demonstrating its generalizability on more complex real-world human behavior understanding applications.

View on arXiv PDF

Similar