Qiwen Chen

28.5CLMay 13

GAGPO: Generalized Advantage Grouped Policy Optimization

Siyuan Zhu, Chao Yu, Rongxin Yang et al.

Reinforcement learning has become a powerful paradigm for post-training large language model agents, yet credit assignment in multi-turn environments remains a challenge. Agents often receive sparse, trajectory-level rewards only at the end of an episode, making it difficult to determine which intermediate actions contributed to success or failure. As a result, propagating delayed outcomes back to individual decision steps without relying on costly auxiliary value models remains an open problem. We propose Generalized Advantage Grouped Policy Optimization (GAGPO), a critic-free reinforcement learning method for precise, step-aligned temporal credit assignment. GAGPO constructs a non-parametric grouped value proxy from sampled rollouts and uses it to compute TD/GAE-style temporal advantages, recursively propagating outcome supervision backward through time. Combined with group-wise advantage normalization and an action-level importance ratio, GAGPO extracts stable, localized optimization signals directly from multi-turn trajectories. Experiments on ALFWorld and WebShop show that GAGPO outperforms strong reinforcement learning baselines. Further analyses demonstrate faster early-stage learning, improved interaction efficiency, and smoother optimization dynamics, suggesting that GAGPO offers a simple yet effective framework for multi-turn agentic reinforcement learning.

HCOct 13, 2020

Real-Time Detection of Simulator Sickness in Virtual Reality Games Based on Players' Psychophysiological Data during Gameplay

Jialin Wang, Hai-Ning Liang, Diego Monteiro et al.

Virtual Reality (VR) technology has been proliferating in the last decade, especially in the last few years. However, Simulator Sickness (SS) still represents a significant problem for its wider adoption. Currently, the most common way to detect SS is using the Simulator Sickness Questionnaire (SSQ). SSQ is a subjective measurement and is inadequate for real-time applications such as VR games. This research aims to investigate how to use machine learning techniques to detect SS based on in-game characters' and users' physiological data during gameplay in VR games. To achieve this, we designed an experiment to collect such data with three types of games. We trained a Long Short-Term Memory neural network with the dataset eye-tracking and character movement data to detect SS in real-time. Our results indicate that, in VR games, our model is an accurate and efficient way to detect SS in real-time.

Qiwen Chen

2 Papers