Exactness Matters for Physical Rule Enforcement
For practitioners in scientific ML, this work clarifies when to enforce physical constraints in autoregressive models, showing that exact projections are beneficial but approximate repairs can be harmful, which is a nuanced but incremental contribution.
The paper investigates when enforcing physical constraints via repair maps improves or harms autoregressive forecasting, finding that exact projections (e.g., Fourier projection for periodic flows) drastically reduce rollout error (e.g., from ~9.4e-5 to ~5.4e-7 MSE on NS-128), while approximate repairs can worsen error due to distribution shift. The key insight is that operator exactness—whether the repair map is identity on the target manifold—determines reliability.
Autoregressive scientific forecasters often enforce physical or structural constraints by repairing each predicted state before feeding it back into the model. However, it remains unclear when stronger physical rule enforcement becomes reliable and when it becomes a source of distribution shift. We study this question through operator exactness, meaning whether the repair map is the identity on the target manifold and is aligned with the target geometry. We compare raw forecasting, post hoc repair, and in-loop repair across periodic incompressible Navier--Stokes, non-periodic CFDBench flows, and a hierarchical-forecasting support task. In the exact periodic regime, Fourier projection substantially improves rollout accuracy. On the NS-128 benchmark, a strong Raw-FNO has a final-step rollout MSE at horizon 100 of $(9.390 \pm 6.290)\times 10^{-5}$, and post hoc and in-loop projection reduce it to $(1.130 \pm 0.165)\times 10^{-6}$ and $(5.370 \pm 0.113)\times 10^{-7}$. However, once an exact projection is unavailable and only approximate boundary-preserving cleanup is available, the ordering changes. Across cavity, tube, dam, and cylinder flow, stronger Poisson-based cleanup can reduce divergence while worsening rollout error; target-distortion MSE predicts this harm far better than a linear-system residual. Controlled mismatch, screened cleanup, adaptive gating, and external-backbone checks show that the best approximate-regime operating point can be raw or near-identity. Hierarchical forecasting gives the same broader pattern. Exact forecast reconciliation is a stable baseline, whereas blended top-down repair, a validation-tuned interpolation toward historical-proportion top-down reconciliation, is dataset-dependent. Thus, constraint enforcement should be benchmarked by operator--data alignment before enforcement strength.