IVMar 5, 2022
$\ell_1$DecNet+: A new architecture framework by $\ell_1$ decomposition and iteration unfolding for sparse feature segmentationYumeng Ren, Yiming Gao, Chunlin Wu et al.
$\ell_1$ based sparse regularization plays a central role in compressive sensing and image processing. In this paper, we propose $\ell_1$DecNet, as an unfolded network derived from a variational decomposition model incorporating $\ell_1$ related sparse regularization and solved by scaled alternating direction method of multipliers (ADMM). $\ell_1$DecNet effectively decomposes an input image into a sparse feature and a learned dense feature, and thus helps the subsequent sparse feature related operations. Based on this, we develop $\ell_1$DecNet+, a learnable architecture framework consisting of our $\ell_1$DecNet and a segmentation module which operates over extracted sparse features instead of original images. This architecture combines well the benefits of mathematical modeling and data-driven approaches. To our best knowledge, this is the first study to incorporate mathematical image prior into feature extraction in segmentation network structures. Moreover, our $\ell_1$DecNet+ framework can be easily extended to 3D case. We evaluate the effectiveness of $\ell_1$DecNet+ on two commonly encountered sparse segmentation tasks: retinal vessel segmentation in medical image processing and pavement crack detection in industrial abnormality identification. Experimental results on different datasets demonstrate that, our $\ell_1$DecNet+ architecture with various lightweight segmentation modules can achieve equal or better performance than their enlarged versions respectively. This leads to especially practical advantages on resource-limited devices.
CVJul 22, 2025Code
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep AdaptationYaofang Liu, Yumeng Ren, Aitor Artola et al.
The rapid advancement of video diffusion models has been hindered by fundamental limitations in temporal modeling, particularly the rigid synchronization of frame evolution imposed by conventional scalar timestep variables. While task-specific adaptations and autoregressive models have sought to address these challenges, they remain constrained by computational inefficiency, catastrophic forgetting, or narrow applicability. In this work, we present Pusa, a groundbreaking paradigm that leverages vectorized timestep adaptation (VTA) to enable fine-grained temporal control within a unified video diffusion framework. Besides, VTA is a non-destructive adaptation, which means it fully preserves the capabilities of the base model. By finetuning the SOTA Wan2.1-T2V-14B model with VTA, we achieve unprecedented efficiency -- surpassing the performance of Wan-I2V-14B with $\leq$ 1/200 of the training cost (\$500 vs. $\geq$ \$100,000) and $\leq$ 1/2500 of the dataset size (4K vs. $\geq$ 10M samples). Pusa not only sets a new standard for image-to-video (I2V) generation, achieving a VBench-I2V total score of 87.32\% (vs. 86.86\% of Wan-I2V-14B), but also unlocks many zero-shot multi-task capabilities such as start-end frames and video extension -- all without task-specific training. Meanwhile, Pusa can still perform text-to-video generation. Mechanistic analyses reveal that our approach preserves the foundation model's generative priors while surgically injecting temporal dynamics, avoiding the combinatorial explosion inherent to vectorized timesteps. This work establishes a scalable, efficient, and versatile paradigm for next-generation video synthesis, democratizing high-fidelity video generation for research and industry alike. Code is open-sourced at https://github.com/Yaofang-Liu/Pusa-VidGen
LGJan 19
Adaptively trained Physics-informed Radial Basis Function Neural Networks for Solving Multi-asset Option Pricing ProblemsYan Ma, Yumeng Ren
The present study investigates the numerical solution of Black-Scholes partial differential equation (PDE) for option valuation with multiple underlying assets. We develop a physics-informed (PI) machine learning algorithm based on a radial basis function neural network (RBFNN) that concurrently optimizes the network architecture and predicts the target option price. The physics-informed radial basis function neural network (PIRBFNN) combines the strengths of the traditional radial basis function collocation method and the physics-informed neural network machine learning approach to effectively solve PDE problems in the financial context. By employing a PDE residual-based technique to adaptively refine the distribution of hidden neurons during the training process, the PIRBFNN facilitates accurate and efficient handling of multidimensional option pricing models featuring non-smooth payoff conditions. The validity of the proposed method is demonstrated through a set of experiments encompassing a single-asset European put option, a double-asset exchange option, and a four-asset basket call option.
CVMar 22, 2025
Efficient Diffusion Training through Parallelization with Truncated Karhunen-Loève ExpansionYumeng Ren, Yaofang Liu, Aitor Artola et al.
Diffusion denoising models have become a popular approach for image generation, but they often suffer from slow convergence during training. In this paper, we identify that this slow convergence is partly due to the complexity of the Brownian motion driving the forward-time process. To address this, we represent the Brownian motion using the Karhunen-Loève expansion, truncating it to a limited number of eigenfunctions. We propose a novel ordinary differential equation with augmented random initials, termed KL diffusion, as a new forward-time process for training and sampling. By developing an appropriate denoising loss function, we facilitate the integration of our KL-diffusion into existing denoising-based models. Using the widely adopted DDIM framework as our baseline ensures a fair comparison, as our modifications focus solely on the forward process and loss function, leaving the network architecture and sampling methods unchanged. Our method significantly outperforms baseline diffusion models, achieving convergence speeds that are twice faster to reach the best FID score of the baseline and ultimately yielding much lower FID scores. Notably, our approach allows for highly parallelized computation, requires no additional learnable parameters, and can be flexibly integrated into existing diffusion methods. The code will be made publicly available.