Chihao Zhang

LG
h-index11
16papers
62citations
Novelty64%
AI Score53

16 Papers

LGAug 10, 2024
Dynamical causality under invisible confounders

Jinling Yan, Shao-Wu Zhang, Chihao Zhang et al.

Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result, accurately inferring causation with invisible confounders remains a largely unexplored and outstanding issue in data science and AI fields. In this work, we propose a method to overcome such challenges to infer dynamical causality under invisible confounders (CIC method) and further reconstruct the invisible confounders from time-series data by developing an orthogonal decomposition theorem in a delay embedding space. The core of our CIC method lies in its ability to decompose the observed variables not in their original space but in their delay embedding space into the common and private subspaces respectively, thereby quantifying causality between those variables both theoretically and computationally. This theoretical foundation ensures the causal detection for any high-dimensional system even with only two observed variables under many invisible confounders, which is actually a long-standing problem in the field. In addition to the invisible confounder problem, such a decomposition actually makes the intertwined variables separable in the embedding space, thus also solving the non-separability problem of causal inference. Extensive validation of the CIC method is carried out using various real datasets, and the experimental results demonstrates its effectiveness to reconstruct real biological networks even with unobserved confounders.

AIDec 9, 2025
Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology

Rongzhao Zhang, Junqiao Wang, Shuyun Yang et al.

Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.

LGJul 14, 2023
On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen, Yuchen He, Chihao Zhang

Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $Θ\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(ε,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $Θ\left(\frac{1}{ε^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

MLSep 1, 2021Code
Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification

Chihao Zhang, Yiling Elaine Chen, Shihua Zhang et al.

Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels for all data points (instances) in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide the label combination for all data points by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion that balances the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and classification resolution (how many labels are predictable), to guide practitioners on how to combine ambiguous outcome labels. To find the optimal label combination indicated by ITCA, we propose two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: improving prediction accuracy and identifying ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification. We also provide theoretical insights into ITCA by studying the oracle and the linear discriminant analysis classification algorithms. Python package itca (available at https://github.com/JSB-UCLA/ITCA) implements ITCA and search strategies.

DSMay 7
Discrete Optimal Transport: Rapid Convergence of Simulated Annealing Algorithms

Yuchen He, Tianhui Jiang, Sihan Wang et al.

We develop a discrete optimal transport framework for analyzing simulated annealing algorithms on finite state spaces. Building on the discrete Wasserstein metric introduced by Maas (J. Funct. Anal., 2011), we define a generalized discrete Wasserstein-2 distance and the associated notion of \emph{discrete action} for paths of probability measures on graphs. Using these tools, we establish non-asymptotic convergence guarantees for simulated annealing: the KL divergence between the algorithm's output and the target distribution is controlled by the discrete action of the annealing path. This can be viewed as the discrete counterpart of the action-based analysis of annealed Langevin dynamics in continuous spaces by Guo, Tao, and Chen (ICLR 2025). As applications, we analyze simulated annealing for two fundamental models in statistical physics. For the \emph{mean-field Ising model}, we show that annealed single-site Glauber dynamics achieves $\varepsilon$ error in KL divergence in $O(n^5β^2/\varepsilon)$ steps at \emph{any} inverse temperature $β\ge 0$. For the \emph{mean-field $q$-state Potts model}, we show that annealed $(q-1)$-block Glauber dynamics achieves $\varepsilon$ error in $\mathrm{poly}(n, β, 1/\varepsilon)$ steps for all $β\ge β_{\mathsf{s}}=q/2$, the regime where the disordered phase has completely lost stability. In both cases, the key technical contribution is a polynomial upper bound on the discrete action, obtained by exploiting the symmetry of the model to reduce the analysis to a low-dimensional projected chain.

LGMay 30, 2022
Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

Yuchen He, Chihao Zhang

The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem by encoding in a directed graph how the loss vector can be observed in each round of the game. The mini-max regret is closely related to the structure of the feedback graph and their connection is far from being fully understood. We propose a new algorithmic framework for the problem based on a partition of the feedback graph. Our analysis reveals the interplay between various parts of the graph by decomposing the regret to the sum of the regret caused by small parts and the regret caused by their interaction. As a result, our algorithm can be viewed as an interpolation and generalization of the optimal algorithms for MAB and learning with expert advice. Our framework unifies previous algorithms for both strongly observable graphs and weakly observable graphs, resulting in improved and optimal regret bounds on a wide range of graph families including graphs of bounded degree and strongly observable graphs with a few corrupted arms.

DSFeb 10, 2025
On the query complexity of sampling from non-log-concave distributions

Yuchen He, Chihao Zhang

We study the problem of sampling from a $d$-dimensional distribution with density $p(x)\propto e^{-f(x)}$, which does not necessarily satisfy good isoperimetric conditions. Specifically, we show that for any $L,M$ satisfying $LM\ge d\ge 5$, $ε\in \left(0,\frac{1}{32}\right)$, and any algorithm with query accesses to the value of $f(x)$ and $\nabla f(x)$, there exists an $L$-log-smooth distribution with second moment at most $M$ such that the algorithm requires $\left(\frac{LM}{dε}\right)^{Ω(d)}$ queries to compute a sample whose distribution is within $ε$ in total variation distance to the target distribution. We complement the lower bound with an algorithm requiring $\left(\frac{LM}{dε}\right)^{\mathcal{O}(d)}$ queries, thereby characterizing the tight (up to the constant in the exponent) query complexity for sampling from the family of non-log-concave distributions. Our results are in sharp contrast with the recent work of Huang et al. (COLT'24), where an algorithm with quasi-polynomial query complexity was proposed for sampling from a non-log-concave distribution when $M=\mathtt{poly}(d)$. Their algorithm works under the stronger condition that all distributions along the trajectory of the Ornstein-Uhlenbeck process, starting from the target distribution, are $\mathcal{O}(1)$-log-smooth. We investigate this condition and prove that it is strictly stronger than requiring the target distribution to be $\mathcal O(1)$-log-smooth. Additionally, we study this condition in the context of mixtures of Gaussians. Finally, we place our results within the broader theme of ``sampling versus optimization'', as studied in Ma et al. (PNAS'19). We show that for a wide range of parameters, sampling is strictly easier than optimization by a super-exponential factor in the dimension $d$.

LGMar 4, 2025
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits

Zichun Ye, Chihao Zhang, Jiahao Zhao

We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the $n$ arms are present in a stream, and at most $m<n$ arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms $n$, the memory size $m$, the number of rounds $T$ and $(Δ_i)_{i\in [n]}$ where $Δ_i$ is the reward mean gap between the best arm and the $i$-th arm. These gaps are not known in advance by the player. Specifically, for any constant $α\ge 1$, we present two algorithms: one applicable for $m\ge \frac{2}{3}n$ with regret at most $O_α\Big(\frac{(n-m)T^{\frac{1}{α+ 1}}}{n^{1 + {\frac{1}{α+ 1}}}}\displaystyle\sum_{i:Δ_i > 0}Δ_i^{1 - 2α}\Big)$ and another applicable for $m<\frac{2}{3}n$ with regret at most $O_α\Big(\frac{T^{\frac{1}{α+1}}}{m^{\frac{1}{α+1}}}\displaystyle\sum_{i:Δ_i > 0}Δ_i^{1 - 2α}\Big)$. We also prove matching lower bounds for both cases by showing that for any constant $α\ge 1$ and any $m\leq k < n$, there exists a set of hard instances on which the regret of any algorithm is $Ω_α\Big(\frac{(k-m+1) T^{\frac{1}{α+1}}}{k^{1 + \frac{1}{α+1}}} \sum_{i:Δ_i > 0}Δ_i^{1-2α}\Big)$. This is the first tight gap-dependent regret bound for streaming MAB. Prior to our work, an $O\Big(\sum_{i\colonΔ>0} \frac{\sqrt{T}\log T}{Δ_i}\Big)$ upper bound for the special case of $α=1$ and $m=O(1)$ was established by Agarwal, Khanna and Patil (COLT'22). In contrast, our results provide the correct order of regret as $Θ\Big(\frac{1}{\sqrt{m}}\sum_{i\colonΔ>0}\frac{\sqrt{T}}{Δ_i}\Big)$.

LGOct 2, 2025
Graph Generation with Spectral Geodesic Flow Matching

Xikun Huang, Tianyu Ruan, Chihao Zhang et al.

Graph generation is a fundamental task with wide applications in modeling complex systems. Although existing methods align the spectrum or degree profile of the target graph, they often ignore the geometry induced by eigenvectors and the global structure of the graph. In this work, we propose Spectral Geodesic Flow Matching (SFMG), a novel framework that uses spectral eigenmaps to embed both input and target graphs into continuous Riemannian manifolds. We then define geodesic flows between embeddings and match distributions along these flows to generate output graphs. Our method yields several advantages: (i) captures geometric structure beyond eigenvalues, (ii) supports flexible generation of diverse graphs, and (iii) scales efficiently. Empirically, SFMG matches the performance of state-of-the-art approaches on graphlet, degree, and spectral metrics across diverse benchmarks. In particular, it achieves up to 30$\times$ speedup over diffusion-based models, offering a substantial advantage in scalability and training efficiency. We also demonstrate its ability to generalize to unseen graph scales. Overall, SFMG provides a new approach to graph synthesis by integrating spectral geometry with flow matching.

DSJul 15, 2025
Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions

Yuchen He, Zhehan Lei, Jianan Shao et al.

We study the problem of sampling from a distribution $μ$ with density $\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with query access to $V$ and $\nabla V$. We start with the following standard assumptions: (1) The potential function $V$ is $L$-smooth. (2) The second moment $\mathbf{E}_{X\sim μ}[\|X\|^2]\leq M$. Recently, He and Zhang (COLT'25) showed that the query complexity of sampling from such distributions is at least $\left(\frac{LM}{dε}\right)^{Ω(d)}$ where $ε$ is the desired accuracy in total variation distance, and the Poincaré constant can be arbitrarily large. Meanwhile, another common assumption in the study of diffusion based samplers (see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR'23)) strengthens the smoothness condition (1) to the following: (1*) The potential function of *every* distribution along the Ornstein-Uhlenbeck process starting from $μ$ is $L$-smooth. We show that under the assumptions (1*) and (2), the query complexity of sampling from $μ$ can be $\mathrm{poly}(L,d)\cdot \left(\frac{Ld+M}{ε^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial in $d$ and $\frac{1}ε$ when $L=\mathcal{O}(1)$ and $M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query complexity developed by Huang et al. (COLT'24). Our results imply that the seemly moderate strengthening of the smoothness condition (1) to (1*) can lead to an exponential gap in the query complexity of sampling algorithms. Moreover, we show that together with the assumption (1*) and the stronger moment assumption that $\|X\|$ is $λ$-sub-Gaussian for $X\simμ$, the Poincaré constant of $μ$ is at most $\mathcal{O}(λ)^{2(L+1)}$. As an application of our technique, we obtain improved estimate of the Poincaré constant for mixture of Gaussians with the same covariance.

LGApr 16, 2025
On the Problem of Best Arm Retention

Houshuang Chen, Yuchen He, Chihao Zhang

This paper presents a comprehensive study on the problem of Best Arm Retention (BAR), which has recently found applications in streaming algorithms for multi-armed bandits. In the BAR problem, the goal is to retain $m$ arms with the best arm included from $n$ after some trials, in stochastic multi-armed bandit settings. We first investigate pure exploration for the BAR problem under different criteria, and then minimize the regret with specific constraints, in the context of further exploration in streaming algorithms. - We begin by revisiting the lower bound for the $(\varepsilon,δ)$-PAC algorithm for Best Arm Identification (BAI) and adapt the classical KL-divergence argument to derive optimal bounds for $(\varepsilon,δ)$-PAC algorithms for BAR. - We further study another variant of the problem, called $r$-BAR, which requires the expected gap between the best arm and the optimal arm retained is less than $r$. We prove tight sample complexity for the problem. - We explore the regret minimization problem for $r$-BAR and develop algorithm beyond pure exploration. We conclude with a conjecture on the optimal regret in this setting.

GTApr 6, 2025
Tight Regret Bounds for Fixed-Price Bilateral Trade

Houshuang Chen, Yaonan Jin, Pinyan Lu et al.

We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetildeΘ(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $Ω(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $Ω(T^{5/7})$ lower bound obtained in the work [BCCF24] and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works [CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24] (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{fractal elimination}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.

LGMay 29, 2021
Understanding Bandits with Graph Feedback

Houshuang Chen, Zengfeng Huang, Shuai Li et al.

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $δ^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( δ^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $Ω\left(\left(δ^*/α\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $α$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.

LGFeb 10, 2020
Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Chihao Zhang, Yang Yang, Wei Zhang et al.

Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine. Moreover, big data are often distributedly collected and stored on different machines. Thus, such data generally bear strong heterogeneous noise. It is essential and useful to develop distributed matrix decomposition for big data analytics. Such a method should scale up well, model the heterogeneous noise, and address the communication issue in a distributed system. To this end, we propose a distributed Bayesian matrix decomposition model (DBMD) for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. We investigate the theoretical convergence behaviors of these algorithms. To address the heterogeneity of the noise, we propose an optimal plug-in weighted average that reduces the variance of the estimation. Synthetic experiments validate our theoretical results, and real-world experiments show that our algorithms scale up well to big data and achieves superior or competing performance compared to other distributed methods.

LGNov 25, 2019
Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

Chihao Zhang, Kuo Gai, Shihua Zhang

Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some variants of PCA for data with non-IID noise have been proposed. However, most of the existing methods only assume that the noise is correlated in the feature space while there may exist two-way structured noise. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness.

CVDec 9, 2017
Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise

Chihao Zhang, Shihua Zhang

Matrix decomposition is a popular and fundamental approach in machine learning and data mining. It has been successfully applied into various fields. Most matrix decomposition methods focus on decomposing a data matrix from one single source. However, it is common that data are from different sources with heterogeneous noise. A few of matrix decomposition methods have been extended for such multi-view data integration and pattern discovery. While only few methods were designed to consider the heterogeneity of noise in such multi-view data for data integration explicitly. To this end, we propose a joint matrix decomposition framework (BJMD), which models the heterogeneity of noise by Gaussian distribution in a Bayesian framework. We develop two algorithms to solve this model: one is a variational Bayesian inference algorithm, which makes full use of the posterior distribution; and another is a maximum a posterior algorithm, which is more scalable and can be easily paralleled. Extensive experiments on synthetic and real-world datasets demonstrate that BJMD considering the heterogeneity of noise is superior or competitive to the state-of-the-art methods.