LGAIMar 27

Distributionally Robust Token Optimization in RLHF

arXiv:2604.0857722.2h-index: 2
AI Analysis

For LLM practitioners, DRTO offers a method to mitigate performance drops from small prompt variations in reasoning tasks.

DRTO combines token-level RLHF with distributionally robust optimization to improve LLM robustness under distribution shifts, achieving 9.17% improvement on GSM8K and 2.49% on MathQA.

Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly large failures, especially on multi-step reasoning problems. To address this problem, we propose a Distributionally Robust Token Optimization (DRTO) approach, which combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distributionally Robust Optimization (DRO). DRTO bounds worst case token-wise rewards by constructing an f-divergence ambiguity set over a loss minibatch, leading to a theoretical robustness. Empirically, DRTO enhances consistency under distribution shifts in mathematical reasoning benchmarks, achieving 9.17\% improvement on GSM8K and 2.49% improvement on MathQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes