CVAIRODec 5, 2025

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation

arXiv:2512.11865v11 citations
Originality Incremental advance
AI Analysis

This addresses robustness and explainability issues in smart farming robotics, though it appears incremental as it builds on the OpenVLA-OFT framework.

The paper tackles the vulnerability of smart farming robotic systems to photometric adversarial attacks by proposing an explainable adversarial-robust Vision-Language-Action model, which reduces Current Action L1 loss by 21.7% and Next Actions L1 loss by 18.4% compared to a baseline.

Smart farming has emerged as a key technology for advancing modern agriculture through automation and intelligent control. However, systems relying on RGB cameras for perception and robotic manipulators for control, common in smart farming, are vulnerable to photometric perturbations such as hue, illumination, and noise changes, which can cause malfunction under adversarial attacks. To address this issue, we propose an explainable adversarial-robust Vision-Language-Action model based on the OpenVLA-OFT framework. The model integrates an Evidence-3 module that detects photometric perturbations and generates natural language explanations of their causes and effects. Experiments show that the proposed model reduces Current Action L1 loss by 21.7% and Next Actions L1 loss by 18.4% compared to the baseline, demonstrating improved action prediction accuracy and explainability under adversarial conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes