Junmei Yang

LG
h-index4
10papers
48citations
Novelty56%
AI Score54

10 Papers

84.1LGMay 9Code
A Minimum Variance Path Principle for Accurate and Stable Score-Based Density Ratio Estimation

Wei Chen, Jiacheng Li, Shigui Li et al.

Score-based methods are powerful across machine learning, but they face a paradox: theoretically path-independent, yet practically path-dependent. We resolve this by proving that practical training objectives differ from the ideal, ground-truth objective by a crucial, overlooked term: the path variance of the score function. We propose the MVP (**M**imum **V**ariance **P**ath) Principle to minimize this path variance. Our key contribution is deriving a closed-form expression for the variance, making optimization tractable. By parameterizing the path with a flexible Kumaraswamy Mixture Model, our method learns data-adaptive, low-variance paths without heuristic manual selection. This principled optimization of the complete objective yields more accurate and stable estimators, establishing new state-of-the-art results on challenging benchmarks and providing a general framework for optimizing score-based interpolation. Our code can be found in https://github.com/Hoemr/OpenDRE.git.

LGSep 17, 2023
Bayesian Gaussian Process ODEs via Double Normalizing Flows

Jian Xu, Shian Du, Junmei Yang et al.

Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been common in GPODE research, limiting the model's ability to represent complex scenarios. To address this limitation, we introduce normalizing flows to reparameterize the ODE vector field, resulting in a data-driven prior distribution, thereby increasing flexibility and expressive power. We develop a data-driven variational learning algorithm that utilizes analytically tractable probability density functions of normalizing flows, enabling simultaneous learning and inference of unknown continuous dynamics. Additionally, we also apply normalizing flows to the posterior inference of GP ODEs to resolve the issue of strong mean-field assumptions in posterior inference. By applying normalizing flows in both these ways, our model improves accuracy and uncertainty estimates for Bayesian Gaussian Process ODEs. We validate the effectiveness of our approach on simulated dynamical systems and real-world human motion data, including time series prediction and missing data recovery tasks. Experimental results show that our proposed method effectively captures model uncertainty while improving accuracy.

95.6LGApr 20Code
Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

Wei Chen, Yubing Wu, Junmei Yang et al.

Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihood displacement, and no general mechanism currently prevents this across objectives. We bridge this gap by presenting a unified \emph{incentive-score decomposition} of preference optimization, revealing that diverse objectives share identical local update directions and differ only in their scalar weighting coefficients. Building on this decomposition, by analyzing the dynamics of the chosen/rejected likelihoods, we identify the \emph{disentanglement band} (DB), a simple, testable condition that characterizes when training can avoid likelihood displacement by realizing the preferred pathway: suppressing the loser while maintaining the winner, possibly after an initial transient. Leveraging the DB, we propose a plug-and-play \emph{reward calibration} (RC) that adaptively rebalances chosen versus rejected updates to satisfy the DB and mitigate likelihood displacement, without redesigning the base objective. Empirical results show that RC steers training toward more disentangled dynamics and often improves downstream performance across a range of objectives. Our code is available at https://github.com/IceyWuu/DisentangledPreferenceOptimization.

70.3MLApr 12
One-Step Score-Based Density Ratio Estimation

Wei Chen, Qibin Zhao, John Paisley et al.

Density ratio estimation (DRE) is a useful tool for quantifying discrepancies between probability distributions, but existing approaches often involve a trade-off between estimation quality and computational efficiency. Classical direct DRE methods are usually efficient at inference time, yet their performance can seriously deteriorate when the discrepancy between distributions is large. In contrast, score-based DRE methods often yield more accurate estimates in such settings, but they typically require considerable repeated function evaluations and numerical integration. We propose One-step Score-based Density Ratio Estimation (OS-DRE), a partly analytic and solver-free framework designed to combine these complementary advantages. OS-DRE decomposes the time score into spatial and temporal components, representing the latter with an analytic radial basis function (RBF) frame. This formulation converts the otherwise intractable temporal integral into a closed-form weighted sum, thereby removing the need for numerical solvers and enabling DRE with only one function evaluation. We further analyze approximation conditions for the analytic frame, and establish approximation error bounds for both finitely and infinitely smooth temporal kernels, grounding the framework in existing approximation theory. Experiments across density estimation, continual Kullback-Leibler and mutual information estimation, and near out-of-distribution detection demonstrate that OS-DRE offers a favorable balance between estimation quality and inference efficiency.

LGSep 22, 2023
Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

Jian Xu, Shian Du, Junmei Yang et al.

Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications

LGAug 13, 2024
Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Jian Xu, Shian Du, Junmei Yang et al.

Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.

LGAug 12, 2024
Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations

Jian Xu, Zhiqi Lin, Min Chen et al.

Deep Gaussian process models typically employ discrete hierarchies, but recent advancements in differential Gaussian processes (DiffGPs) have extended these models to infinite depths. However, existing DiffGP approaches often overlook the uncertainty in kernel hyperparameters by treating them as fixed and time-invariant, which degrades the model's predictive performance and neglects the posterior distribution. In this work, we introduce a fully Bayesian framework that models kernel hyperparameters as random variables and utilizes coupled stochastic differential equations (SDEs) to jointly learn their posterior distributions alongside those of inducing points. By incorporating the estimation uncertainty of hyperparameters, our method significantly enhances model flexibility and adaptability to complex dynamic systems. Furthermore, we employ a black-box adaptive SDE solver with a neural network to achieve realistic, time varying posterior approximations, thereby improving the expressiveness of the variational posterior. Comprehensive experimental evaluations demonstrate that our approach outperforms traditional methods in terms of flexibility, accuracy, and other key performance metrics. This work not only provides a robust Bayesian extension to DiffGP models but also validates its effectiveness in handling intricate dynamic behaviors, thereby advancing the applicability of Gaussian process models in diverse real-world scenarios.

LGAug 7, 2024
Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling

Jian Xu, Zhiqi Lin, Shigui Li et al.

Bayesian Last Layer (BLL) models focus solely on uncertainty in the output layer of neural networks, demonstrating comparable performance to more complex Bayesian models. However, the use of Gaussian priors for last layer weights in Bayesian Last Layer (BLL) models limits their expressive capacity when faced with non-Gaussian, outlier-rich, or high-dimensional datasets. To address this shortfall, we introduce a novel approach that combines diffusion techniques and implicit priors for variational learning of Bayesian last layer weights. This method leverages implicit distributions for modeling weight priors in BLL, coupled with diffusion samplers for approximating true posterior predictions, thereby establishing a comprehensive Bayesian prior and posterior estimation strategy. By delivering an explicit and computationally efficient variational lower bound, our method aims to augment the expressive abilities of BLL models, enhancing model accuracy, calibration, and out-of-distribution detection proficiency. Through detailed exploration and experimental validation, We showcase the method's potential for improving predictive accuracy and uncertainty quantification while ensuring computational efficiency.

LGMay 8, 2025
Dequantified Diffusion-Schr{ö}dinger Bridge for Density Ratio Estimation

Wei Chen, Shigui Li, Jiacheng Li et al.

Density ratio estimation is fundamental to tasks involving $f$-divergences, yet existing methods often fail under significantly different distributions or inadequately overlapping supports -- the density-chasm and the support-chasm problems. Additionally, prior approaches yield divergent time scores near boundaries, leading to instability. We design $\textbf{D}^3\textbf{RE}$, a unified framework for \textbf{robust}, \textbf{stable} and \textbf{efficient} density ratio estimation. We propose the dequantified diffusion bridge interpolant (DDBI), which expands support coverage and stabilizes time scores via diffusion bridges and Gaussian dequantization. Building on DDBI, the proposed dequantified Schr{ö}dinger bridge interpolant (DSBI) incorporates optimal transport to solve the Schr{ö}dinger bridge problem, enhancing accuracy and efficiency. Our method offers uniform approximation and bounded time scores in theory, and outperforms baselines empirically in mutual information and density estimation tasks.

MLSep 5, 2025
Any-Step Density Ratio Estimation via Interval-Annealed Secant Alignment

Wei Chen, Shigui Li, Jiacheng Li et al.

Estimating density ratios is a fundamental problem in machine learning, but existing methods often trade off accuracy for efficiency. We propose \textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a framework that enables accurate, any-step estimation without numerical integration. Instead of modeling infinitesimal tangents as in prior methods, ISA-DRE learns a global secant function, defined as the expectation of all tangents over an interval, with provably lower variance, making it more suitable for neural approximation. This is made possible by the \emph{Secant Alignment Identity}, a self-consistency condition that formally connects the secant with its underlying tangent representations. To mitigate instability during early training, we introduce \emph{Contraction Interval Annealing}, a curriculum strategy that gradually expands the alignment interval during training. This process induces a contraction mapping, which improves convergence and training stability. Empirically, ISA-DRE achieves competitive accuracy with significantly fewer function evaluations compared to prior methods, resulting in much faster inference and making it well suited for real-time and interactive applications.