CR AIMar 28

SkillTester: Benchmarking Utility and Security of Agent Skills

arXiv:2603.2881563.24 citationsh-index: 3Has Code

AI Analysis

For developers and users of AI agents, SkillTester offers a systematic way to assess the utility and security of agent skills, addressing a growing need for quality assurance in agent-based systems.

SkillTester is a tool that evaluates agent skills by comparing execution with and without the skill, producing utility and security scores. It provides a standardized benchmark for assessing skill quality.

This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.

View on arXiv PDF Code

Similar