Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents
This work addresses the inefficiency and unreliability of current informal skill representations for LLM agents operating in real workspaces, offering a more token-efficient and enforceable control surface.
Formal Skill introduces a runtime-native abstraction for LLM agents that replaces informal natural-language skill descriptions with executable state machines and hook policies, achieving competitive performance on Harness-Bench while using substantially fewer tokens.
Large Language Model (LLM) agents increasingly act inside real workspaces, where tools and skills determine whether model reasoning becomes reliable action. Existing skills remain largely informal: Markdown skills and instruction packs encode procedures as long natural-language documents, while function calling, Model Context Protocol (MCP) servers, and framework tools structure individual actions but usually leave workflow state, policy enforcement, and completion discipline outside the skill itself. We introduce Formal Skill, a runtime-native abstraction that represents reusable capability with JSON metadata and action schemas, reliable Python executors, hook-governed control logic, Formal Skill routing, and skill-local runtime state. By moving reusable procedure from repeated prompt text into executable state machines and hook policies, Formal Skill gives agents a token-efficient and enforceable control surface. We implement the abstraction in FairyClaw, an open-source event-driven runtime for executable, observable, and composable Formal Skills. On Harness-Bench, FairyClaw obtains highly competitive average scores while using substantially fewer tokens, with especially strong results on tasks that expose the role of Formal Skill.