ChannelFlow-Tools: A Standardized Dataset Creation Pipeline for 3D Obstructed Channel Flows
This provides a reproducible dataset creation pipeline for CFD surrogate modeling researchers, though it's incremental as it standardizes existing methods rather than introducing new ones.
The authors tackled the problem of inconsistent dataset creation for 3D obstructed channel flow simulations by developing ChannelFlow-Tools, a standardized pipeline that generated 10,000+ scenes with Reynolds numbers from 100 to 15,000 and demonstrated its utility through reproducible ML training examples.
We present ChannelFlow-Tools, a configuration-driven framework that standardizes the end-to-end path from programmatic CAD solid generation to ML-ready inputs and targets for 3D obstructed channel flows. The toolchain integrates geometry synthesis with feasibility checks, signed distance field (SDF) voxelization, automated solver orchestration on HPC (waLBerla LBM), and Cartesian resampling to co-registered multi-resolution tensors. A single Hydra/OmegaConf configuration governs all stages, enabling deterministic reproduction and controlled ablations. As a case study, we generate 10k+ scenes spanning Re=100-15000 with diverse shapes and poses. An end-to-end evaluation of storage trade-offs directly from the emitted artifacts, a minimal 3D U-Net at 128x32x32, and example surrogate models with dataset size illustrate that the standardized representations support reproducible ML training. ChannelFlow-Tools turns one-off dataset creation into a reproducible, configurable pipeline for CFD surrogate modeling.