Hui Yuan

3.3LGJul 10, 2020

Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model

Yingyu Liang, Hui Yuan

In the setting of entangled single-sample distributions, the goal is to estimate some common parameter shared by a family of $n$ distributions, given one single sample from each distribution. This paper studies mean estimation for entangled single-sample Gaussians that have a common mean but different unknown variances. We propose the subset-of-signals model where an unknown subset of $m$ variances are bounded by 1 while there are no assumptions on the other variances. In this model, we analyze a simple and natural method based on iteratively averaging the truncated samples, and show that the method achieves error $O \left(\frac{\sqrt{n\ln n}}{m}\right)$ with high probability when $m=Ω(\sqrt{n\ln n})$, matching existing bounds for this range of $m$. We further prove lower bounds, showing that the error is $Ω\left(\left(\frac{n}{m^4}\right)^{1/2}\right)$ when $m$ is between $Ω(\ln n)$ and $O(n^{1/4})$, and the error is $Ω\left(\left(\frac{n}{m^4}\right)^{1/6}\right)$ when $m$ is between $Ω(n^{1/4})$ and $O(n^{1 - ε})$ for an arbitrarily small $ε>0$, improving existing lower bounds and extending to a wider range of $m$.

4.2LGApr 20, 2020

Learning Entangled Single-Sample Distributions via Iterative Trimming

Hui Yuan, Yingyu Liang

In the setting of entangled single-sample distributions, the goal is to estimate some common parameter shared by a family of distributions, given one \emph{single} sample from each distribution. We study mean estimation and linear regression under general conditions, and analyze a simple and computationally efficient method based on iteratively trimming samples and re-estimating the parameter on the trimmed sample set. We show that the method in logarithmic iterations outputs an estimation whose error only depends on the noise level of the $\lceil αn \rceil$-th noisiest data point where $α$ is a constant and $n$ is the sample size. This means it can tolerate a constant fraction of high-noise points. These are the first such results for the method under our general conditions. It also justifies the wide application and empirical success of iterative trimming in practice. Our theoretical results are complemented by experiments on synthetic data.

Hui Yuan

2 Papers