CVAIMar 14

Discriminative Flow Matching Via Local Generative Predictors

arXiv:2603.1392818.5h-index: 5
Predicted impact top 91% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the need for more robust vision models by bridging generative and discriminative learning, though it appears incremental as an adaptation of flow matching to discriminative tasks.

The paper tackles the problem of static discriminative computer vision lacking iterative refinement by proposing Discriminative Flow Matching, which reformulates classification and object detection as a conditional transport process, achieving competitive performance on standard benchmarks.

Traditional discriminative computer vision relies predominantly on static projections, mapping input features to outputs in a single computational step. Although efficient, this paradigm lacks the iterative refinement and robustness inherent in biological vision and modern generative modelling. In this paper, we propose Discriminative Flow Matching, a framework that reformulates classification and object detection as a conditional transport process. By learning a vector field that continuously transports samples from a simple noise distribution toward a task-aligned target manifold -- such as class embeddings or bounding box coordinates -- we are at the interface between generative and discriminative learning. Our method attaches multiple independent flow predictors to a shared backbone. These predictors are trained using local flow matching objectives, where gradients are computed independently for each block. We formulate this approach for standard image classification and extend it to the complex task of object detection, where targets are high-dimensional and spatially distributed. This architecture provides the flexibility to update blocks either sequentially to minimise activation memory or in parallel to suit different hardware constraints. By aggregating the predictions from these independent flow predictors, our framework enables robust, generative-inspired inference across diverse architectures, including CNNs and vision transformers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes