CVJul 27, 2025

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach

arXiv:2507.20356v49 citationsh-index: 16IEEE Trans Vis Comput Graph
Originality Incremental advance
AI Analysis

This addresses security risks for AR users by detecting subtle attacks that could cause semantic misunderstandings or errors, representing a domain-specific advancement.

The paper tackles the problem of detecting visual information manipulation attacks in augmented reality, which can mislead users by altering the meaning of real-world scenes, and achieves an attack detection accuracy of 88.94% on their dataset with an average latency of about 7 seconds.

The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR, where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM, which consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect the attacks in the dataset, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language models (VLMs) with optical character recognition (OCR)-based textual analysis. VIM-Sense achieves an attack detection accuracy of 88.94% on AR-VIM, consistently outperforming vision-only and text-only baselines. The system achieves an average attack detection latency of 7.07 seconds in a simulated video processing framework and 7.17 seconds in a real-world evaluation conducted on a mobile Android AR application.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes