CVAILGMay 17

Nano World Models: A Minimalist Implementation of Future Video Prediction

arXiv:2605.2399370.3
AI Analysis

This work offers a compact, extensible experimental substrate for the research community to systematically study world model design choices, addressing the lack of unified and reproducible implementations.

Nano World Models provides a minimalist, reproducible codebase for future video prediction using diffusion forcing, enabling controlled studies of world-modeling components across diverse environments. Experiments show how design choices like prediction parameterization and architecture scale affect prediction quality and rollout behavior.

World models have become a central paradigm for learning predictive simulators that support generation, planning, and decision-making. Yet, despite rapid progress in industry-scale interactive video generation, the broader research community still lacks compact, reproducible, and easily extensible implementations for studying the design choices underlying modern world models. We introduce Nano World Models, a minimalist codebase for future video prediction centered around diffusion forcing. Nano World Models provides a unified interface for generative objectives, model scales, action-conditioning mechanisms, latent observation spaces, datasets, evaluation protocols, and long-horizon rollout procedures. This design enables controlled studies of world-modeling components that are often entangled across separate implementations. Through experiments across simple control environments, game simulation, and real-robot data, we examine how prediction parameterization, architecture scale, action injection, sampling budget, and domain complexity affect video prediction quality and autoregressive rollout behavior. By releasing code, configurations, evaluation scripts, and pretrained checkpoints, Nano World Models aims to provide a compact yet extensible experimental substrate for open, reproducible, and scientific world-model research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes