Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models
This addresses robustness issues for deploying LLMs in safety-critical or user-facing applications, though it is an incremental improvement as it builds on existing inference-time methods.
The paper tackles the vulnerability of large language models to minor input perturbations in multiple-choice question answering by introducing Token Constraint Decoding, which restores performance with up to +39% absolute gains for weaker models like Gemma3 1B.
Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token Constraint Decoding (TCD). This simple yet effective inference-time algorithm enforces alignment between token-level predictions to enhance robustness in noisy settings. Through extensive experiments on CommonsenseQA, MMLU, and MMLU-Pro, we show that TCD, especially when paired with prompt engineering (PE) fixes, significantly restores performance degraded by input noise, yielding up to +39\% absolute gains for weaker models like Gemma3 1B. Penalty sweep analyses further reveal that TCD implicitly regularizes overconfident outputs, with different models requiring distinct penalty schedules to maximize resilience. Our findings establish TCD as a practical, model-agnostic approach for improving reasoning stability under real-world imperfections and pave the way for more reliable deployment of LLMs in safety-critical or user-facing applications.