ROCVNov 7, 2025

EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation

arXiv:2511.05397v13 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the affordability and reliability of robotic manipulation for homes and research labs, though it is incremental as it combines existing methods with cost-effective hardware.

The paper tackles the problem of high-cost hardware and poor performance in novel or cluttered scenes for Vision-Language-Action models by introducing EverydayVLA, a low-cost robotic manipulator that matches state-of-the-art success rates on LIBERO and outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution in real-world tests.

While Vision-Language-Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6-DOF manipulator that can be assembled for under $300, capable of modest payloads and workspace. A single unified model jointly outputs discrete and continuous actions, and our adaptive-horizon ensemble monitors motion uncertainty to trigger on-the-fly re-planning for safe, reliable operation. On LIBERO, EverydayVLA matches state-of-the-art success rates, and in real-world tests it outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution. By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA democratizes access to a robotic foundation model and paves the way for economical use in homes and research labs alike. Experiment videos and details: https://everydayvla.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes