29.5LGMar 27
Distributionally Robust Token Optimization in RLHFYeping Jin, Jiaming Hu, Ioannis Ch. Paschalidis
Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly large failures, especially on multi-step reasoning problems. To address this problem, we propose a Distributionally Robust Token Optimization (DRTO) approach, which combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distributionally Robust Optimization (DRO). DRTO bounds worst case token-wise rewards by constructing an f-divergence ambiguity set over a loss minibatch, leading to a theoretical robustness. Empirically, DRTO enhances consistency under distribution shifts in mathematical reasoning benchmarks, achieving 9.17\% improvement on GSM8K and 2.49% improvement on MathQA.
LGJun 2, 2025
Distributionally Robust Learning in Survival AnalysisYeping Jin, Lauren Wise, Ioannis Ch. Paschalidis
We introduce an innovative approach that incorporates a Distributionally Robust Learning (DRL) approach into Cox regression to enhance the robustness and accuracy of survival predictions. By formulating a DRL framework with a Wasserstein distance-based ambiguity set, we develop a variant Cox model that is less sensitive to assumptions about the underlying data distribution and more resilient to model misspecification and data perturbations. By leveraging Wasserstein duality, we reformulate the original min-max DRL problem into a tractable regularized empirical risk minimization problem, which can be computed by exponential conic programming. We provide guarantees on the finite sample behavior of our DRL-Cox model. Moreover, through extensive simulations and real world case studies, we demonstrate that our regression model achieves superior performance in terms of prediction accuracy and robustness compared with traditional methods.