Xiaoyue Ma

74.3AIMay 2

Resource-Efficient Reinforcement for Reasoning Large Language Models via Dynamic One-Shot Policy Refinement

Yunjian Zhang, Sudong Wang, Yang Li et al.

Large language models (LLMs) have exhibited remarkable performance on complex reasoning tasks, with reinforcement learning under verifiable rewards (RLVR) emerging as a principled framework for aligning model behavior with reasoning chains. Despite its promise, RLVR remains prohibitively resource-intensive, requiring extensive reward signals and incurring substantial rollout costs during training. In this work, we revisit the fundamental question of data and compute efficiency in RLVR. We first establish a theoretical lower bound on the sample complexity required to unlock reasoning capabilities, and empirically validate that strong performance can be achieved with a surprisingly small number of training instances. To tackle the computational burden, we propose Dynamic One-Shot Policy Refinement (DoPR), an uncertainty-aware RL strategy that dynamically selects a single informative training sample per batch for policy updates, guided by reward volatility and exploration-driven acquisition. DoPR reduces rollout overhead by nearly an order of magnitude while preserving competitive reasoning accuracy, offering a scalable and resource-efficient solution for LLM post-training. This approach offers a practical path toward more efficient and accessible RL-based training for reasoning-intensive LLM applications.

HCOct 8, 2020

VirusBoxing: A HIIT-based VR boxing game

Wenge Xu, Hai-Ning Liang, Xiaoyue Ma et al.

Physical activity or exercise can improve people's health and reduce their risk of developing several diseases; most importantly, regular activity can improve the quality of life. However, lack of time is one of the major barriers for people doing exercise. High-intensity interval training (HIIT) can reduce the time required for a healthy exercise regime but also bring similar benefits of regular exercise. We present a boxing-based VR exergame called VirusBoxing to promote physical activity for players. VirusBoxing provides players with a platform for HIIT and empowers them with additional abilities to jab a distant object without the need to aim at it precisely. In this paper, we discuss how we adapted the HIIT protocol and gameplay features to empower players in a VR exergame to give players an efficient, effective, and enjoyable exercise experience.

Xiaoyue Ma

2 Papers