CLMay 20, 2025

Incorporating Token Usage into Prompting Strategy Evaluation

arXiv:2505.14880v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses the practical need for efficiency-aware evaluations in real-world applications, though it is incremental in focusing on token usage as a metric.

The paper tackles the problem of evaluating prompting strategies for large language models by proposing efficiency metrics that balance performance and token usage, finding that increased token usage leads to drastically diminishing returns.

In recent years, large language models have demonstrated remarkable performance across diverse tasks. However, their task effectiveness is heavily dependent on the prompting strategy used to elicit output, which can vary widely in both performance and token usage. While task performance is often used to determine prompting strategy success, we argue that efficiency--balancing performance and token usage--can be a more practical metric for real-world utility. To enable this, we propose Big-$O_{tok}$, a theoretical framework for describing the token usage growth of prompting strategies, and analyze Token Cost, an empirical measure of tokens per performance. We apply these to several common prompting strategies and find that increased token usage leads to drastically diminishing performance returns. Our results validate the Big-$O_{tok}$ analyses and reinforce the need for efficiency-aware evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes