LG CVNov 14, 2025

Low-Bit, High-Fidelity: Optimal Transport Quantization for Flow Matching

Dara Varam, Diaa A. Abuhani, Imran Zualkernan, Raghad AlDamani, Lujain Khalil

arXiv:2511.11418v14.1h-index: 6

Originality Highly original

AI Analysis

This enables compression of FM generative models for edge and embedded AI applications, representing a domain-specific incremental improvement.

The paper tackles the problem of high-precision parameter requirements in Flow Matching (FM) generative models by applying optimal transport-based post-training quantization, achieving preserved generation quality and latent space stability down to 2-3 bits per parameter across five benchmark datasets.

Flow Matching (FM) generative models offer efficient simulation-free training and deterministic sampling, but their practical deployment is challenged by high-precision parameter requirements. We adapt optimal transport (OT)-based post-training quantization to FM models, minimizing the 2-Wasserstein distance between quantized and original weights, and systematically compare its effectiveness against uniform, piecewise, and logarithmic quantization schemes. Our theoretical analysis provides upper bounds on generative degradation under quantization, and empirical results across five benchmark datasets of varying complexity show that OT-based quantization preserves both visual generation quality and latent space stability down to 2-3 bits per parameter, where alternative methods fail. This establishes OT-based quantization as a principled, effective approach to compress FM generative models for edge and embedded AI applications.

View on arXiv PDF

Similar