Scalable Environments Drive Generalizable Agents
For AI researchers aiming to build generalizable agents, this paper clarifies a key bottleneck (world-level distribution shift) and provides a framework for addressing it, but it is a position paper without empirical results.
This position paper argues that generalizable agents require environment scaling—expanding the distribution of executable rule-sets—rather than only scaling trajectories or tasks within fixed benchmarks. It proposes a taxonomy separating trajectory, task, and environment scaling, and discusses construction paradigms for scalable environments to drive progress toward robust general agents.
Generalizable agents should adapt to diverse tasks and unseen environments beyond their training distribution. This position paper argues that such generalization requires environment scaling: expanding the distribution of executable rule-sets that agents interact with, rather than only increasing trajectories or tasks within fixed benchmarks. Current scaling practices largely focus on collecting more experience or broader task sets under fixed interaction rules, leaving agents brittle when underlying interfaces, dynamics, observations, or feedback signals change. The core challenge is therefore a world-level distribution shift: agents need systematic exposure to environments with meaningfully different executable rule-sets. To clarify this challenge, we propose a unified taxonomy that separates trajectory scaling, task scaling, and environment scaling by their primary deliverables and by what changes in the executable rule-set. Building on this taxonomy, we synthesize construction paradigms for scalable environments, contrasting programmatic generators that prioritize controllability and verifiability with generative world models that offer broader coverage and open-endedness. We further outline how environment scaling can be coupled with stateful learning mechanisms, emphasizing learned update rules for cross-environment adaptation. We conclude by discussing alternative perspectives and argue that scalable environments provide the essential substrate for measurable and controllable progress toward robust general agents.