CL AIMay 29, 2025

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi

arXiv:2505.23729v213.99 citationsh-index: 25ICML

Originality Incremental advance

AI Analysis

This addresses the problem of multifaceted alignment for LLMs by incorporating bounded rationality principles, offering a novel but incremental approach to inference-time optimization.

The paper tackles the challenge of aligning large language models with human preferences by proposing SITAlign, an inference-time framework that maximizes a primary objective while satisfying threshold-based constraints on secondary criteria, resulting in a 22.3% improvement in GPT-4 win-tie rate for helpfulness on the PKU-SafeRLHF dataset.

Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives while ensuring others meet acceptable thresholds. To bridge this gap and operationalize the notion of satisficing alignment, we propose SITAlign: an inference time framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria. We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach. We empirically validate SITAlign's performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-SafeRLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness.

View on arXiv PDF

Similar