LGAIJan 29

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

arXiv:2601.21459v34 citationsh-index: 26
Originality Highly original
AI Analysis

This addresses the problem of cognitive simulation in LLM role-playing for applications like companionship and gaming, representing a novel method for a known bottleneck.

The paper tackles the challenge of simulating inner thoughts in LLM role-playing by proposing HER, a framework that uses dual-layer thinking and reinforcement learning, resulting in a 30.26 improvement on the CoSER benchmark and a 14.97 gain on the Minimax Role-Play Bench compared to the baseline.

LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation, and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM role-play, previous efforts mainly suffer from two deficiencies: data with high-quality reasoning traces, and reliable reward signals aligned with human preferences. In this paper, we propose HER, a unified framework for cognitive-level persona simulation. HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking. To bridge these gaps, we curate reasoning-augmented role-playing data via reverse engineering and construct human-aligned principles and reward models. Leveraging these resources, we train HER models based on Qwen3-32B via supervised and reinforcement learning. Extensive experiments validate the effectiveness of our approach. Notably, our models significantly outperform the Qwen3-32B baseline, achieving a 30.26 improvement on the CoSER benchmark and a 14.97 gain on the Minimax Role-Play Bench. Our datasets, principles, and models will be released to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes