Agentic Performance at the Edge: Insights from Benchmarking
This work provides initial empirical guidance for practitioners selecting models for agentic edge AI under resource constraints.
The paper investigates how agentic AI quality degrades when model size is constrained for edge deployment, finding that quality is not simply a function of parameter count but depends on joint design of model choice and tool workflow, with Pareto fronts in accuracy-latency space guiding strategy selection.
Agentic artificial intelligence (AI) is a natural fit for Internet of Things (IoT) and edge systems, but edge deployments are often constrained to models around 8 billion parameters or smaller. An important question is: How much agentic-task quality is lost when model size is constrained by memory, power, and latency budgets? To address this question, in this paper, we provide an initial empirical study considering edge-focused model scaling, general-purpose versus coder-oriented model effects, and tool-enabled execution under a fixed protocol. We introduce a domain-conditioned evaluation methodology, an implementation-grounded analysis of model-tool interactions, practical guidance for model selection under constraints, and an analysis of failure modes that reveals distinct semantic versus execution failure patterns across model families. Our core finding is that edge-agent quality is not a simple function of parameter count. Robust deployment depends on the joint design of model choice and tool workflow. Domain-conditioned analysis reveals Pareto fronts in the accuracy-latency space that can guide strategy selection based on operational priorities.