Guanyang Wang

LG
h-index8
10papers
14citations
Novelty57%
AI Score52

10 Papers

LGJan 29Code
TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

Tianqi Zhao, Guanyang Wang, Yan Shuo Tan et al.

Clustering tabular data is a fundamental yet challenging problem due to heterogeneous feature types, diverse data-generating mechanisms, and the absence of transferable inductive biases across datasets. Prior-fitted networks (PFNs) have recently demonstrated strong generalization in supervised tabular learning by amortizing Bayesian inference under a broad synthetic prior. Extending this paradigm to clustering is nontrivial: clustering is unsupervised, admits a combinatorial and permutation-invariant output space, and requires inferring the number of clusters. We introduce TabClustPFN, a prior-fitted network for tabular data clustering that performs amortized Bayesian inference over both cluster assignments and cluster cardinality. Pretrained on synthetic datasets drawn from a flexible clustering prior, TabClustPFN clusters unseen datasets in a single forward pass, without dataset-specific retraining or hyperparameter tuning. The model naturally handles heterogeneous numerical and categorical features and adapts to a wide range of clustering structures. Experiments on synthetic data and curated real-world tabular benchmarks show that TabClustPFN outperforms classical, deep, and amortized clustering baselines, while exhibiting strong robustness in out-of-the-box exploratory settings. Code is available at https://github.com/Tianqi-Zhao/TabClustPFN.

LGMay 11
Couple to Control: Joint Initial Noise Design in Diffusion Models

Jing Jia, Liyue Shen, Guanyang Wang

Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specify a coupling of the initial noises: each noise remains marginally standard Gaussian, so the pretrained diffusion model receives the same single-sample input distribution, while the dependence across samples is chosen by design. This reframes initial-noise control from selecting or optimizing individual seeds to designing the dependence structure of a multi-sample gallery. This view gives a general framework for initial-noise design, covering several existing methods as special cases and leading naturally to new coupled-noise constructions. Coupled noise can improve generation on its own without adding sampling cost, and it is flexible enough to serve as a structured initialization for optimization-based pipelines when additional computation is available. Empirically, repulsive Gaussian coupling improves gallery diversity on SD1.5, SDXL, and SD3 while largely preserving prompt alignment and image quality. It matches or outperforms recent test-time noise-optimization baselines on several diversity metrics at the same sampling cost as independent generation. Subspace couplings also support fixed-object background generation, producing diverse, natural backgrounds compared with specialized inpainting baselines, with a tunable trade-off in foreground fidelity.

LGFeb 4
Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving

Xiaoyu Qiu, Taewon Yang, Zhanhao Liu et al.

Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.

LGJan 30
Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance

Jing Jia, Wei Yuan, Sifan Liu et al.

Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. Our theory, based on Bayesian consistency, gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal. These results provide a principled justification on when weak diffusion priors can be used reliably.

QUANT-PHFeb 7, 2025
Quantum speedup of non-linear Monte Carlo problems

Jose Blanchet, Yassine Hamoudi, Mario Szegedy et al.

The mean of a random variable can be understood as a linear functional on the space of probability distributions. Quantum computing is known to provide a quadratic speedup over classical Monte Carlo methods for mean estimation. In this paper, we investigate whether a similar quadratic speedup is achievable for estimating non-linear functionals of probability distributions. We propose a quantum-inside-quantum Monte Carlo algorithm that achieves such a speedup for a broad class of non-linear estimation problems, including nested conditional expectations and stochastic optimization. Our algorithm improves upon the direct application of the quantum multilevel Monte Carlo algorithm introduced by An et al. (2021). The existing lower bound indicates that our algorithm is optimal up polylogarithmic factors. A key innovation of our approach is a new sequence of multilevel Monte Carlo approximations specifically designed for quantum computing, which is central to the algorithm's improved performance.

LGFeb 7, 2025
CCS: Controllable and Constrained Sampling with Diffusion Models via Initial Noise Perturbation

Bowen Song, Zecheng Zhang, Zhaoxu Luo et al.

Diffusion models have emerged as powerful tools for generative tasks, producing high-quality outputs across diverse domains. However, how the generated data responds to the initial noise perturbation in diffusion models remains under-explored, which hinders understanding the controllability of the sampling process. In this work, we first observe an interesting phenomenon: the relationship between the change of generation outputs and the scale of initial noise perturbation is highly linear through the diffusion ODE sampling. Then we provide both theoretical and empirical study to justify this linearity property of this input-output (noise-generation data) relationship. Inspired by these new insights, we propose a novel Controllable and Constrained Sampling method (CCS) together with a new controller algorithm for diffusion models to sample with desired statistical properties while preserving good sample quality. We perform extensive experiments to compare our proposed sampling approach with other methods on both sampling controllability and sampled data quality. Results show that our CCS method achieves more precisely controlled sampling while maintaining superior sample quality and diversity.

CRFeb 10, 2024
Differentially Private Range Queries with Correlated Input Perturbation

Prathamesh Dharangutte, Jie Gao, Ruobin Gong et al.

This work proposes a class of differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure. The proposed Cascade Sampling algorithm instantiates the mechanism exactly and efficiently. Our theoretical and empirical analysis demonstrates that we achieve near-optimal utility, effectively compete with other methods, and retain all the favorable statistical properties discussed earlier.

COMay 19, 2020
On the Theoretical Properties of the Exchange Algorithm

Guanyang Wang

The exchange algorithm is one of the most popular extensions of the Metropolis--Hastings algorithm to sample from doubly-intractable distributions. However, the theoretical exploration of the exchange algorithm is very limited. For example, natural questions like `Does exchange algorithm converge at a geometric rate?' or `Does the exchange algorithm admit a Central Limit Theorem?' have not been answered yet. In this paper, we study the theoretical properties of the exchange algorithm, in terms of asymptotic variance and convergence speed. We compare the exchange algorithm with the original Metropolis--Hastings algorithm and provide both necessary and sufficient conditions for the geometric ergodicity of the exchange algorithm. Moreover, we prove that our results can be applied to various practical applications such as location models, Gaussian models, Poisson models, and a large class of exponential families, which includes most of the practical applications of the exchange algorithm. A central limit theorem for the exchange algorithm is also established. Our results justify the theoretical usefulness of the exchange algorithm.

MLMay 26, 2019
Modeling treatment events in disease progression

Guanyang Wang, Yumeng Zhang, Yong Deng et al.

Ability to quantify and predict progression of a disease is fundamental for selecting an appropriate treatment. Many clinical metrics cannot be acquired frequently either because of their cost (e.g. MRI, gait analysis) or because they are inconvenient or harmful to a patient (e.g. biopsy, x-ray). In such scenarios, in order to estimate individual trajectories of disease progression, it is advantageous to leverage similarities between patients, i.e. the covariance of trajectories, and find a latent representation of progression. Most of existing methods for estimating trajectories do not account for events in-between observations, what dramatically decreases their adequacy for clinical practice. In this study, we develop a machine learning framework named Coordinatewise-Soft-Impute (CSI) for analyzing disease progression from sparse observations in the presence of confounding events. CSI is guaranteed to converge to the global minimum of the corresponding optimization problem. Experimental results also demonstrates the effectiveness of CSI using both simulated and real dataset.

COMar 13, 2019
A Multi-armed Bandit MCMC, with applications in sampling from doubly intractable posterior

Guanyang Wang

Markov chain Monte Carlo (MCMC) algorithms are widely used to sample from complicated distributions, especially to sample from the posterior distribution in Bayesian inference. However, MCMC is not directly applicable when facing the doubly intractable problem. In this paper, we discussed and compared two existing solutions -- Pseudo-marginal Monte Carlo and Exchange Algorithm. This paper also proposes a novel algorithm: Multi-armed Bandit MCMC (MABMC), which chooses between two (or more) randomized acceptance ratios in each step. MABMC could be applied directly to incorporate Pseudo-marginal Monte Carlo and Exchange algorithm, with higher average acceptance probability.