Chika Maduabuchi

CV
h-index12
6papers
4citations
Novelty58%
AI Score49

6 Papers

LGFeb 25
Entropy-Controlled Flow Matching

Chika Maduabuchi

Modern vision generators transport a base distribution to data through time-indexed measures, implemented as deterministic flows (ODEs) or stochastic diffusions (SDEs). Despite strong empirical performance, standard flow-matching objectives do not directly control the information geometry of the trajectory, allowing low-entropy bottlenecks that can transiently deplete semantic modes. We propose Entropy-Controlled Flow Matching (ECFM): a constrained variational principle over continuity-equation paths enforcing a global entropy-rate budget d/dt H(mu_t) >= -lambda. ECFM is a convex optimization in Wasserstein space with a KKT/Pontryagin system, and admits a stochastic-control representation equivalent to a Schrodinger bridge with an explicit entropy multiplier. In the pure transport regime, ECFM recovers entropic OT geodesics and Gamma-converges to classical OT as lambda -> 0. We further obtain certificate-style mode-coverage and density-floor guarantees with Lipschitz stability, and construct near-optimal collapse counterexamples for unconstrained flow matching.

CVMar 12
Event-Driven Video Generation

Chika Maduabuchi

State-of-the-art text-to-video models often look realistic frame-by-frame yet fail on simple interactions: motion starts before contact, actions are not realized, objects drift after placement, and support relations break. We argue this stems from frame-first denoising, which updates latent state everywhere at every step without an explicit notion of when and where an interaction is active. We introduce Event-Driven Video Generation (EVD), a minimal DiT-compatible framework that makes sampling event-grounded: a lightweight event head predicts token-aligned event activity, event-grounded losses couple activity to state change during training, and event-gated sampling (with hysteresis and early-step scheduling) suppresses spurious updates while concentrating updates during interactions. On EVD-Bench, EVD consistently improves human preference and VBench dynamics, substantially reducing failure modes in state persistence, spatial accuracy, support relations, and contact stability without sacrificing appearance. These results indicate that explicit event grounding is a practical abstraction for reducing interaction hallucinations in video generation.

CVOct 22, 2024Code
VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation

Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

High-speed video (HSV) segmentation is essential for analyzing dynamic physical processes in scientific and industrial applications, such as boiling heat transfer. Existing models like U-Net struggle with generalization and accurately segmenting complex bubble formations. We present VideoSAM, a specialized adaptation of the Segment Anything Model (SAM), fine-tuned on a diverse HSV dataset for phase detection. Through diverse experiments, VideoSAM demonstrates superior performance across four fluid environments -- Water, FC-72, Nitrogen, and Argon -- significantly outperforming U-Net in complex segmentation tasks. In addition to introducing VideoSAM, we contribute an open-source HSV segmentation dataset designed for phase detection, enabling future research in this domain. Our findings underscore VideoSAM's potential to set new standards in robust and accurate HSV segmentation. The code and dataset used in this study are available online at https://github.com/chikap421/videosam.

CVMay 24, 2025Code
Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation

Chika Maduabuchi, Hao Chen, Yujin Han et al.

Latent Video Diffusion Models (LVDMs) achieve high-quality generation but are sensitive to imperfect conditioning, which causes semantic drift and temporal incoherence on noisy, web-scale video-text datasets. We introduce CAT-LVDM, the first corruption-aware training framework for LVDMs that improves robustness through structured, data-aligned noise injection. Our method includes Batch-Centered Noise Injection (BCNI), which perturbs embeddings along intra-batch semantic directions to preserve temporal consistency. BCNI is especially effective on caption-rich datasets like WebVid-2M, MSR-VTT, and MSVD. We also propose Spectrum-Aware Contextual Noise (SACN), which injects noise along dominant spectral directions to improve low-frequency smoothness, showing strong results on UCF-101. On average, BCNI reduces FVD by 31.9% across WebVid-2M, MSR-VTT, and MSVD, while SACN yields a 12.3% improvement on UCF-101. Ablation studies confirm the benefit of low-rank, data-aligned noise. Our theoretical analysis further explains how such perturbations tighten entropy, Wasserstein, score-drift, mixing-time, and generalization bounds. CAT-LVDM establishes a principled, scalable training approach for robust video diffusion under multimodal noise. Code and models: https://github.com/chikap421/catlvdm

CVNov 12, 2024Code
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

High-speed video (HSV) phase detection (PD) segmentation is crucial for monitoring vapor, liquid, and microlayer phases in industrial processes. While CNN-based models like U-Net have shown success in simplified shadowgraphy-based two-phase flow (TPF) analysis, their application to complex HSV PD tasks remains unexplored, and vision foundation models (VFMs) have yet to address the complexities of either shadowgraphy-based or PD TPF video segmentation. Existing uncertainty quantification (UQ) methods lack pixel-level reliability for critical metrics like contact line density and dry area fraction, and the absence of large-scale, multimodal experimental datasets tailored to PD segmentation further impedes progress. To address these gaps, we propose MSEG-VCUQ. This hybrid framework integrates U-Net CNNs with the transformer-based Segment Anything Model (SAM) to achieve enhanced segmentation accuracy and cross-modality generalization. Our approach incorporates systematic UQ for robust error assessment and introduces the first open-source multimodal HSV PD datasets. Empirical results demonstrate that MSEG-VCUQ outperforms baseline CNNs and VFMs, enabling scalable and reliable PD segmentation for real-world boiling dynamics.

LGFeb 4
Temporal Pair Consistency for Variance-Reduced Flow Matching

Chika Maduabuchi, Jindong Wang

Continuous-time generative models, such as diffusion models, flow matching, and rectified flow, learn time-dependent vector fields but are typically trained with objectives that treat timesteps independently, leading to high estimator variance and inefficient sampling. Prior approaches mitigate this via explicit smoothness penalties, trajectory regularization, or modified probability paths and solvers. We introduce Temporal Pair Consistency (TPC), a lightweight variance-reduction principle that couples velocity predictions at paired timesteps along the same probability path, operating entirely at the estimator level without modifying the model architecture, probability path, or solver. We provide a theoretical analysis showing that TPC induces a quadratic, trajectory-coupled regularization that provably reduces gradient variance while preserving the underlying flow-matching objective. Instantiated within flow matching, TPC improves sample quality and efficiency across CIFAR-10 and ImageNet at multiple resolutions, achieving lower FID at identical or lower computational cost than prior methods, and extends seamlessly to modern SOTA-style pipelines with noise-augmented training, score-based denoising, and rectified flow.