CL AIJun 14, 2025

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

Xiaotian Zhang, Yuan Wang, Zhaopeng Feng, Ruizhe Chen, Zhijie Zhou, Yan Zhang, Hongxia Xu, Jian Wu, Zuozhu Liu

arXiv:2506.12307v210 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of fragmented medical QA for healthcare and AI researchers, though it appears incremental as it builds on existing reasoning-augmented LLMs with a novel training approach.

The paper tackles the lack of a unified framework for medical question-answering tasks by introducing Med-U1, which uses large-scale reinforcement learning to improve performance across multiple benchmarks, surpassing larger specialized models and showing robust generalization to out-of-distribution tasks.

Medical Question-Answering (QA) encompasses a broad spectrum of tasks, including multiple choice questions (MCQ), open-ended text generation, and complex computational reasoning. Despite this variety, a unified framework for delivering high-quality medical QA has yet to emerge. Although recent progress in reasoning-augmented large language models (LLMs) has shown promise, their ability to achieve comprehensive medical understanding is still largely unexplored. In this paper, we present Med-U1, a unified framework for robust reasoning across medical QA tasks with diverse output formats, ranging from MCQs to complex generation and computation tasks. Med-U1 employs pure large-scale reinforcement learning with mixed rule-based binary reward functions, incorporating a length penalty to manage output verbosity. With multi-objective reward optimization, Med-U1 directs LLMs to produce concise and verifiable reasoning chains. Empirical results reveal that Med-U1 significantly improves performance across multiple challenging Med-QA benchmarks, surpassing even larger specialized and proprietary models. Furthermore, Med-U1 demonstrates robust generalization to out-of-distribution (OOD) tasks. Extensive analysis presents insights into training strategies, reasoning chain length control, and reward design for medical LLMs. Our code is available here.

View on arXiv PDF

Similar