Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts

arXiv:2605.0905517.8

Predicted impact top 32% in RO · last 90 daysOriginality Highly original

AI Analysis

For AI agent developers and roboticists, Octopus Protocol eliminates the dominant bottleneck of writing hardware drivers and SDKs, enabling one-shot hardware integration.

Octopus Protocol reduces the engineering cost of bringing up new hardware for AI agent control to a single shell command by automatically discovering devices, inferring capabilities, generating an MCP server with typed tools, and deploying it as a live HTTP endpoint. On three heterogeneous platforms and a robotic arm, it onboards hardware in ~10-15 minutes and exposes up to 30 MCP tools for closed-loop control.

Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.

View on arXiv PDF

Similar