OSAIApr 30

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

arXiv:2604.2813870.6
Predicted impact top 13% in OS · last 90 daysOriginality Incremental advance
AI Analysis

For developers deploying autonomous agents, Crab provides efficient and correct checkpoint/restore for fault tolerance and rollback, addressing a critical bottleneck in agent infrastructure.

Crab introduces a semantics-aware checkpoint/restore runtime for agent sandboxes that bridges the agent-OS semantic gap, achieving 100% recovery correctness (vs. 8% for chat-only), reducing checkpoint traffic by up to 87%, and staying within 1.9% of fault-free execution time.

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-OS semantic gap: agent frameworks see tool calls but not their OS effects; the OS sees state changes but lacks turn-level context to judge recovery relevance. This gap hides massive sparsity: over 75% of agent turns produce no recovery-relevant state, so most checkpoints are unnecessary. Crab (Checkpoint-and-Restore for Agent SandBoxes) is a transparent host-side runtime that bridges this gap without modifying agents or C/R backends. An eBPF-based inspector classifies each turn's OS-visible effects to decide checkpoint granularity; a coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time; and a host-scoped engine schedules checkpoint traffic across co-located sandboxes. On shell-intensive and code-repair workloads, Crab raises recovery correctness from 8% (chat-only) to 100%, cuts checkpoint traffic by up to 87%, and stays within 1.9% of fault-free execution time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes