Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

arXiv:2602.01284v11.2

Originality Synthesis-oriented

AI Analysis

This research addresses the problem of improving media literacy for the general public by understanding human detection strategies, though it is incremental as it builds on existing work in deepfake detection.

The study investigated human strategies for detecting deepfake videos, finding that participants were more accurate with real videos and identified visual, audio, and intuition cues as key to successful detection through association rule mining.

As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a study with 195 participants between the ages of 21 and 40, who judged real and deepfake videos, rated their confidence, and reported the cues they relied on across visual, audio, and knowledge strategies. Participants were more accurate with real videos than with deepfakes and showed lower expected calibration error for real content. Through association rule mining, we identified cue combinations that shaped performance. Visual appearance, vocal, and intuition often co-occurred for successful identifications, which highlights the importance of multimodal approaches in human detection. Our findings show which cues help or hinder detection and suggest directions for designing media literacy tools that guide effective cue use. Building on these insights can help people improve their identification skills and become more resilient to deceptive digital media.

View on arXiv PDF

Similar