CVMar 4, 2024

A New Perspective on Smiling and Laughter Detection: Intensity Levels Matter

arXiv:2403.02112v17 citationsh-index: 42ACII
Originality Incremental advance
AI Analysis

This work addresses the need for more nuanced detection in human-agent interaction systems, though it is incremental as it builds on existing multimodal methods with a focus on intensity analysis.

The paper tackles the problem of classifying smiles and laughs as distinct entities using a deep learning-based multimodal system, finding that fusion of audio and vision models improves generalization on unseen data and that intensity levels reveal a complex relationship not captured by binary or single-category approaches.

Smiles and laughs detection systems have attracted a lot of attention in the past decade contributing to the improvement of human-agent interaction systems. But very few considered these expressions as distinct, although no prior work clearly proves them to belong to the same category or not. In this work, we present a deep learning-based multimodal smile and laugh classification system, considering them as two different entities. We compare the use of audio and vision-based models as well as a fusion approach. We show that, as expected, the fusion leads to a better generalization on unseen data. We also present an in-depth analysis of the behavior of these models on the smiles and laughs intensity levels. The analyses on the intensity levels show that the relationship between smiles and laughs might not be as simple as a binary one or even grouping them in a single category, and so, a more complex approach should be taken when dealing with them. We also tackle the problem of limited resources by showing that transfer learning allows the models to improve the detection of confusing intensity levels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes