Chao Gao

h-index16

9papers

343citations

Novelty52%

AI Score27

Ranked #153,061 of 194,257 authors (top 79%)#129 in ST (top 64%)

9 Papers

11.3STOct 8, 2021

Uncertainty quantification in the Bradley-Terry-Luce model

Chao Gao, Yandi Shen, Anderson Y. Zhang

The Bradley-Terry-Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of $\ell_2$ estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.

1.2STSep 8, 2020

Convergence Rates of Empirical Bayes Posterior Distributions: A Variational Perspective

Fengshuo Zhang, Chao Gao

We study the convergence rates of empirical Bayes posterior distributions for nonparametric and high-dimensional inference. We show that as long as the hyperparameter set is discrete, the empirical Bayes posterior distribution induced by the maximum marginal likelihood estimator can be regarded as a variational approximation to a hierarchical Bayes posterior distribution. This connection between empirical Bayes and variational Bayes allows us to leverage the recent results in the variational Bayes literature, and directly obtains the convergence rates of empirical Bayes posterior distributions from a variational perspective. For a more general hyperparameter set that is not necessarily discrete, we introduce a new technique called "prior decomposition" to deal with prior distributions that can be written as convex combinations of probability measures whose supports are low-dimensional subspaces. This leads to generalized versions of the classical "prior mass and testing" conditions for the convergence rates of empirical Bayes. Our theory is applied to a number of statistical estimation problems including nonparametric density estimation and sparse linear regression.

4.3STMay 20, 2020

Model Repair: Robust Recovery of Over-Parameterized Statistical Models

Chao Gao, John Lafferty

A new type of robust estimation problem is introduced where the goal is to recover a statistical model that has been corrupted after it has been estimated from data. Methods are proposed for "repairing" the model using only the design and not the response values used to fit the model in a supervised learning setting. Theory is developed which reveals that two important ingredients are necessary for model repair---the statistical model must be over-parameterized, and the estimator must incorporate redundancy. In particular, estimators based on stochastic gradient descent are seen to be well suited to model repair, but sparse estimators are not in general repairable. After formulating the problem and establishing a key technical lemma related to robust estimation, a series of results are presented for repair of over-parameterized linear models, random feature models, and artificial neural networks. Simulation studies are presented that corroborate and illustrate the theoretical findings.

13.6MLOct 4, 2018Code

Robust Estimation and Generative Adversarial Nets

Chao Gao, Jiyi Liu, Yuan Yao et al.

Robust estimation under Huber's $ε$-contamination model has become an important topic in statistics and theoretical computer science. Statistically optimal procedures such as Tukey's median and other estimators based on depth functions are impractical because of their computational intractability. In this paper, we establish an intriguing connection between $f$-GANs and various depth functions through the lens of $f$-Learning. Similar to the derivation of $f$-GANs, we show that these depth functions that lead to statistically optimal robust estimators can all be viewed as variational lower bounds of the total variation distance in the framework of $f$-Learning. This connection opens the door of computing robust estimators using tools developed for training GANs. In particular, we show in both theory and experiments that some appropriate structures of discriminator networks with hidden layers in GANs lead to statistically optimal robust location estimators for both Gaussian distribution and general elliptical distributions where first moment may not exist.

24.1STDec 7, 2017

Convergence Rates of Variational Posterior Distributions

Fengshuo Zhang, Chao Gao

We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood, and variational class that characterize the convergence rates. Under similar "prior mass and testing" conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.

7.3STNov 30, 2017

Phase Transitions in Approximate Ranking

Chao Gao

We study the problem of approximate ranking from observations of pairwise interactions. The goal is to estimate the underlying ranks of $n$ objects from data through interactions of comparison or collaboration. Under a general framework of approximate ranking models, we characterize the exact optimal statistical error rates of estimating the underlying ranks. We discover important phase transition boundaries of the optimal error rates. Depending on the value of the signal-to-noise ratio (SNR) parameter, the optimal rate, as a function of SNR, is either trivial, polynomial, exponential or zero. The four corresponding regimes thus have completely different error behaviors. To the best of our knowledge, this phenomenon, especially the phase transition between the polynomial and the exponential rates, has not been discovered before.

8.4LGFeb 21, 2017

Stochastic Canonical Correlation Analysis

Chao Gao, Dan Garber, Nathan Srebro et al.

We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve $ε$-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with $N(ε, Δ, γ)$ samples, where $Δ$ is the singular value gap of the whitened cross-covariance matrix and $1/γ$ is an upper bound of the condition number of auto-covariance matrices. Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm; this algorithm is based on the shift-and-invert power iterations and only needs to process the dataset for $\mathcal{O}\left(\log \frac{1}ε \right)$ passes. Finally, we show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once.

17.8STFeb 15, 2017

Robust Regression via Mutivariate Regression Depth

Chao Gao

This paper studies robust regression in the settings of Huber's $ε$-contamination models. We consider estimators that are maximizers of multivariate regression depth functions. These estimators are shown to achieve minimax rates in the settings of $ε$-contamination models for various regression problems including nonparametric regression, sparse linear regression, reduced rank regression, etc. We also discuss a general notion of depth function for linear operators that has potential applications in robust functional linear regression.

15.5MLMay 25, 2016

Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao, Yu Lu, Dengyong Zhou

In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent $mI(π)$ in which $m$ is the number of workers and $I(π)$ the average Chernoff information that characterizes the workers' collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement $m>\frac{1}{I(π)}\log\frac{1}ε$ in order to achieve an $ε$ misclassification error. In addition, our results imply the optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.