GR CVJan 7

In-SRAM Radiant Foam Rendering on a Graph Processor

Zulkhuu Tuya, Ignacio Alzugaray, Nicholas Fry, Andrew J. Davison

arXiv:2601.04382v11.2

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient rendering on emerging distributed-memory hardware for researchers and developers in computer graphics and accelerator design, though it is incremental as it adapts an existing method to new hardware.

The paper tackled the challenge of volumetric rendering on many-core accelerators with distributed SRAM by developing a fully in-SRAM, distributed renderer for the Radiant Foam representation on the Graphcore Mk2 IPU, achieving near-interactive throughput of about 1 fps at 640x480 resolution with quality close to the original GPU implementation.

Many emerging many-core accelerators replace a single large device memory with hundreds to thousands of lightweight cores, each owning only a small local SRAM and exchanging data via explicit on-chip communication. This organization offers high aggregate bandwidth, but it breaks a key assumption behind many volumetric rendering techniques: that rays can randomly access a large, unified scene representation. Rendering efficiently on such hardware therefore requires distributing both data and computation, keeping ray traversal mostly local, and structuring communication into predictable routes. We present a fully in-SRAM, distributed renderer for the \emph{Radiant Foam} Voronoi-cell volumetric representation on the Graphcore Mk2 IPU, a many-core accelerator with tile-local SRAM and explicit inter-tile communication. Our system shards the scene across tiles and forwards rays between shards through a hierarchical routing overlay, enabling ray marching entirely from on-chip SRAM with predictable communication. On Mip-NeRF~360 scenes, the system attains near-interactive throughput ($\approx$1\,fps at \mbox{$640\times480$}) with image and depth quality close to the original GPU-based Radiant Foam implementation, while keeping all scene data and ray state in on-chip SRAM. Beyond demonstrating feasibility, we analyze routing, memory, and scheduling bottlenecks that inform how future distributed-memory accelerators can better support irregular, data-movement-heavy rendering workloads.

View on arXiv PDF

Similar