A Survey on Progress in LLM Alignment from the Perspective of Reward Design
It provides a structured overview for researchers and practitioners in AI alignment, but it is incremental as it synthesizes existing work without introducing new methods.
This survey tackles the problem of aligning large language models with human values by organizing and analyzing reward design strategies, highlighting recent paradigm shifts such as moving from reinforcement learning-based to RL-free optimization and from single-task to multi-objective settings.
Reward design plays a pivotal role in aligning large language models (LLMs) with human values, serving as the bridge between feedback signals and model optimization. This survey provides a structured organization of reward modeling and addresses three key aspects: mathematical formulation, construction practices, and interaction with optimization paradigms. Building on this, it develops a macro-level taxonomy that characterizes reward mechanisms along complementary dimensions, thereby offering both conceptual clarity and practical guidance for alignment research. The progression of LLM alignment can be understood as a continuous refinement of reward design strategies, with recent developments highlighting paradigm shifts from reinforcement learning (RL)-based to RL-free optimization and from single-task to multi-objective and complex settings.