LGCVMar 12

Generalization and Memorization in Rectified Flow

arXiv:2603.1342152.3h-index: 2
Predicted impact top 48% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This addresses privacy risks in generative models for AI practitioners by revealing and reducing memorization in Rectified Flow, though it is incremental as it builds on existing MIA methods.

The paper investigates how Rectified Flow models memorize training data by developing a complexity-calibrated metric for Membership Inference Attacks, which boosts attack AUC by up to 15% and TPR@1%FPR by up to 45%, and finds that memorization peaks at the integration midpoint during training, which can be mitigated by using a Symmetric Exponential distribution to preserve generative fidelity.

Generative models based on the Flow Matching objective, particularly Rectified Flow, have emerged as a dominant paradigm for efficient, high-fidelity image synthesis. However, while existing research heavily prioritizes generation quality and architectural scaling, the underlying dynamics of how RF models memorize training data remain largely underexplored. In this paper, we systematically investigate the memorization behaviors of RF through the test statistics of Membership Inference Attacks (MIA). We progressively formulate three test statistics, culminating in a complexity-calibrated metric ($T_\text{mc\_cal}$) that successfully decouples intrinsic image spatial complexity from genuine memorization signals. This calibration yields a significant performance surge -- boosting attack AUC by up to 15\% and the privacy-critical TPR@1\%FPR metric by up to 45\% -- establishing the first non-trivial MIA specifically tailored for RF. Leveraging these refined metrics, we uncover a distinct temporal pattern: under standard uniform temporal training, a model's susceptibility to MIA strictly peaks at the integration midpoint, a phenomenon we justify via the network's forced deviation from linear approximations. Finally, we demonstrate that substituting uniform timestep sampling with a Symmetric Exponential (U-shaped) distribution effectively minimizes exposure to vulnerable intermediate timesteps. Extensive evaluations across three datasets confirm that this temporal regularization suppresses memorization while preserving generative fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes