AI HC LGDec 4, 2023

Training Reinforcement Learning Agents and Humans With Difficulty-Conditioned Generators

Sidney Tio, Jimmy Ho, Pradeep Varakantham

arXiv:2312.02309v13.91 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the challenge of personalized training in parameterized environments for RL agents and human learners, though it appears incremental as it adapts existing IRT concepts to this domain.

The paper tackles the problem of training both reinforcement learning agents and human learners by developing PERM, a method that aligns environment difficulty with individual ability using Item Response Theory, resulting in effective training without requiring real-time RL updates.

We adapt Parameterized Environment Response Model (PERM), a method for training both Reinforcement Learning (RL) Agents and human learners in parameterized environments by directly modeling difficulty and ability. Inspired by Item Response Theory (IRT), PERM aligns environment difficulty with individual ability, creating a Zone of Proximal Development-based curriculum. Remarkably, PERM operates without real-time RL updates and allows for offline training, ensuring its adaptability across diverse students. We present a two-stage training process that capitalizes on PERM's adaptability, and demonstrate its effectiveness in training RL agents and humans in an empirical study.

View on arXiv PDF

Similar