CV AI RODec 5, 2025

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation

Ju-Young Kim, Ji-Hong Park, Myeongjun Kim, Gun-Woo Kim

arXiv:2512.11865v11 citations

Originality Incremental advance

AI Analysis

This addresses robustness and explainability issues in smart farming robotics, though it appears incremental as it builds on the OpenVLA-OFT framework.

The paper tackles the vulnerability of smart farming robotic systems to photometric adversarial attacks by proposing an explainable adversarial-robust Vision-Language-Action model, which reduces Current Action L1 loss by 21.7% and Next Actions L1 loss by 18.4% compared to a baseline.

Smart farming has emerged as a key technology for advancing modern agriculture through automation and intelligent control. However, systems relying on RGB cameras for perception and robotic manipulators for control, common in smart farming, are vulnerable to photometric perturbations such as hue, illumination, and noise changes, which can cause malfunction under adversarial attacks. To address this issue, we propose an explainable adversarial-robust Vision-Language-Action model based on the OpenVLA-OFT framework. The model integrates an Evidence-3 module that detects photometric perturbations and generates natural language explanations of their causes and effects. Experiments show that the proposed model reduces Current Action L1 loss by 21.7% and Next Actions L1 loss by 18.4% compared to the baseline, demonstrating improved action prediction accuracy and explainability under adversarial conditions.

View on arXiv PDF

Similar