CVApr 18

Modeling Biomechanical Constraint Violations for Language-Agnostic Lip-Sync Deepfake Detection

arXiv:2604.1680828.6h-index: 7
AI Analysis

It addresses the problem of cross-language generalization in lip-sync deepfake detection for security and media forensics.

The paper identifies that generative models violate biomechanical constraints of orofacial articulation, producing measurable temporal lip jitter, and proposes BioLip, a lightweight detector using 64 perioral landmarks, achieving language-agnostic lip-sync deepfake detection.

Current lip-sync deepfake detectors rely on pixel-level artifacts or audio-visual correspondence, failing to generalize across languages because these cues encode data-dependent patterns rather than universal physical laws. We identify a more fundamental principle: generative models do not enforce the biomechanical constraints of authentic orofacial articulation, producing measurably elevated temporal lip variance -- a signal we term temporal lip jitter -- that is empirically consistent across the speaker's language, ethnicity, and recording conditions. We instantiate this principle through BioLip, a lightweight framework operating on 64 perioral landmark coordinates extracted by MediaPipe.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes