GRMay 11

Fewer, Better Frames: A Compute-Normalized Proof of Concept for Coherence-First World-Model Rendering with Model-Guided FSR4 Frame Generation

arXiv:2606.025866.7

Predicted impact top 92% in GR · last 90 daysOriginality Incremental advance

AI Analysis

For developers of world models and real-time rendering systems, this proof-of-concept suggests that allocating inference budget to higher-quality anchor frames can improve long-horizon coherence under limited compute.

This paper demonstrates that a coherence-first world-model rendering approach, generating fewer anchor frames (15 FPS) and reconstructing to 30 FPS, preserves scene stability and visual quality better than a native 30 FPS cadence-first baseline across multiple scenes, with LPIPS metrics favoring the coherence-first method.

World models are often evaluated by native frame cadence, but higher nominal frame rate can trade away long-horizon scene stability. This article reports an independent proof of concept implemented using Overworld's Waypoint-1.5 family and WorldEngine runtime on a Windows fallback stack with ONNX Runtime + DirectML and an FSR4 DX12 bridge. The tested coherence-first branch generates higher-context anchor frames at a 15 FPS presentation-timeline cadence and reconstructs presentation to 30 FPS using latent-delta motion guidance and synthesized depth. It is compared against a lower-context cadence-first baseline that generates about 30 FPS natively under the same seed, route, control script, target presentation duration, and local time-scaling regime. Across forest, sword, desert, and snow scenes, the coherence-first branch preserves path geometry, object identity, large silhouettes, and depth layering longer, while the baseline degrades earlier into brightness drift and geometric distortion. Lightweight temporal metrics and paired videos support the visual comparison, with LPIPS favoring the coherence-first branch across all tested scenes. Here compute-normalized means approximately matched same-GPU, same-timescale operating points, not exact FLOP parity or measured realtime throughput. A separate heavier sword-scene probe suggests local non-monotonicity: more context and denoising did not automatically improve quality. These results support coherence-first allocation as a practical proof-of-concept strategy under limited inference budget, not as a finished realtime renderer.

View on arXiv PDF

Similar