CRETLGApr 30

Selfie-Capture Dynamics as an Auxiliary Signal Against Deepfakes and Injection Attacks for Mobile Identity Verification

arXiv:2605.0021825.0
AI Analysis

For mobile identity verification systems, this work provides a low-friction auxiliary channel to complement camera-based liveness detection, though results are preliminary and require cross-device/session validation.

This paper investigates whether passive motion traces from mobile sensors during selfie capture can serve as an auxiliary signal against deepfakes and injection attacks for identity verification. Using a new dataset (CanSelfie), they found that accelerometer-based methods can reject stationary attack proxies with 0% FRR and achieve 1.07% EER for user verification, but with high FAR (32-44%) in spoof screening.

Mobile remote identity verification (RIdV) systems are exposed to attacks that manipulate or replace the facial video stream, including presentation attacks, real-time deepfakes, and video injection. Recent European requirements, including ETSI TS 119 461 and CEN/TS 18099, motivate complementary evidence channels beyond camera-based presentation-attack detection. This paper investigates whether passive motion traces recorded during selfie capture provide auxiliary evidence for spoof screening and user verification. We introduce CanSelfie, a dataset of 375 bona fide multi-sensor sequences collected at 50\,Hz from 30 participants using a commercial mobile RIdV application, together with stationary, handheld, and temporally shifted attack-proxy scenarios. We benchmark 7 multivariate time-series classifiers and 8 whole-series anomaly detectors across sensor configurations and temporal windows. For spoof screening, accelerometer-only ROCKAD obtains 0.00\% false rejection rate (FRR) and 43.8\% false acceptance rate (FAR), while QUANT+3-NN obtains the lowest overall FAR of 32.0\% at 2.37\% FRR; both reject all stationary attack proxies. For same-device and same-session user verification, WEASEL+MUSE reaches 1.07\% equal error rate (EER) using 9 sensor channels. The analysis shows that raw accelerometer data, preserving gravity and orientation cues, is the most informative modality, and that closed-set classification accuracy alone does not imply good verification performance because threshold calibration depends on score distributions. The findings suggest that short selfie-capture motion traces contain measurable spoof-related and identity-related information, supporting their use as a low-friction auxiliary signal while also identifying the need for cross-device, cross-session, and real injection-attack evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes