Unsupervised Audio-Visual Subspace Alignment for High-Stakes Deception Detection
This research addresses the critical problem of detecting high-stakes deception in real-world scenarios for medical, social work, and legal domains, where labeled data is scarce, by offering an unsupervised solution.
The paper introduces the first unsupervised multimodal transfer learning approach for detecting high-stakes deception in videos. The proposed subspace-alignment (SA) method adapts audio-visual representations from low-stakes scenarios to high-stakes situations, outperforming models without SA, human ability, and performing comparably to existing supervised models.
Automated systems that detect deception in high-stakes situations can enhance societal well-being across medical, social work, and legal domains. Existing models for detecting high-stakes deception in videos have been supervised, but labeled datasets to train models can rarely be collected for most real-world applications. To address this problem, we propose the first multimodal unsupervised transfer learning approach that detects real-world, high-stakes deception in videos without using high-stakes labels. Our subspace-alignment (SA) approach adapts audio-visual representations of deception in lab-controlled low-stakes scenarios to detect deception in real-world, high-stakes situations. Our best unsupervised SA models outperform models without SA, outperform human ability, and perform comparably to a number of existing supervised models. Our research demonstrates the potential for introducing subspace-based transfer learning to model high-stakes deception and other social behaviors in real-world contexts with a scarcity of labeled behavioral data.