An Empirical Audit of k-NAF Budget Accounting for Anchored Decoding
For researchers and practitioners using Anchored Decoding, this work provides an empirical audit of budget accounting, but the findings are incremental and confirm expected behavior.
The paper audits the k-NAF budget-accounting mechanism in Anchored Decoding, finding that mean cumulative KL spend remains far below sequence-level budgets and that adaptive search does not produce clear budget exhaustion, with high proxy spend ratios attributed to artifacts rather than per-trajectory failures.
We empirically audit the k-NAF budget-accounting mechanism in Anchored Decoding using (i) a fixed, class-stratified workload (approximately 8,500 randomized executions across six prompt classes) and (ii) an adaptive prompt-search procedure targeting high proxy spend ratios. On the fixed workload, mean cumulative KL spend remains far below the sequence-level budgets K in {600, 1000}, and an empirical Bernstein-style proxy stays below K for every class; surface-overlap diagnostics (ROUGE-L and 5-gram Jaccard) are correspondingly small. Adaptive search increases the proxy spend ratio but does not produce clear budget exhaustion. On a held-out copyright-domain workload at k = 3, several prompts exhibit proxy ratios above 1 under early-stopped evaluations with small realized sample sizes; re-evaluating the same prompts with larger allocation reduces the proxy ratio to the range [0.26, 0.40] under comparable mean spend, consistent with proxy artifacts rather than per-trajectory budget failures.