LG AIApr 19

A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

Zhiyin Yu, Yuchen Mou, Juncheng Yan, Junyu Luo, Chunchun Chen, Xing Wei, Yunhui Liu, Hongru Sun, Yuxing Zhang, Jun Xu, Yatao Bian, Ming Zhang

Peking U

arXiv:2604.1731299.2h-index: 14

Predicted impact top 1% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

It offers the first systematic review of data-efficient RL for LLMs, providing a conceptual foundation for researchers in this emerging area.

This survey systematically reviews reinforcement learning for large language models under data scarcity, proposing a hierarchical framework with data-centric, training-centric, and framework-centric perspectives. It provides a taxonomy of methods and analyzes their strengths and limitations to guide future research.

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

View on arXiv PDF

Similar