MLLGApr 15, 2018

Approximating the covariance ellipsoid

arXiv:1804.05402v11 citations
Originality Incremental advance
AI Analysis

This provides a method for efficiently estimating covariance structures in high-dimensional statistics, with applications in machine learning and data analysis, though it is incremental as it builds on existing approximation techniques.

The paper tackles the problem of approximating the covariance ellipsoid of a random vector using simple sets from data, achieving results with sample sizes scaling as N = c d η^{-4} log(2/η) under minimal assumptions and improved to N = c d η^{-2} log(2/η) for specific cases like rotation-invariant distributions.

We explore ways in which the covariance ellipsoid ${\cal B}=\{v \in \mathbb{R}^d : \mathbb{E} <X,v>^2 \leq 1\}$ of a centred random vector $X$ in $\mathbb{R}^d$ can be approximated by a simple set. The data one is given for constructing the approximating set consists of $X_1,...,X_N$ that are independent and distributed as $X$. We present a general method that can be used to construct such approximations and implement it for two types of approximating sets. We first construct a (random) set ${\cal K}$ defined by a union of intersections of slabs $H_{z,α}=\{v \in \mathbb{R}^d : |<z,v>| \leq α\}$ (and therefore ${\cal K}$ is actually the output of a neural network with two hidden layers). The slabs are generated using $X_1,...,X_N$, and under minimal assumptions on $X$ (e.g., $X$ can be heavy-tailed) it suffices that $N = c_1d η^{-4}\log(2/η)$ to ensure that $(1-η) {\cal K} \subset {\cal B} \subset (1+η){\cal K}$. In some cases (e.g., if $X$ is rotation invariant and has marginals that are well behaved in some weak sense), a smaller sample size suffices: $N = c_1dη^{-2}\log(2/η)$. We then show that if the slabs are replaced by randomly generated ellipsoids defined using $X_1,...,X_N$, the same degree of approximation is true when $N \geq c_2dη^{-2}\log(2/η)$. The construction we use is based on the small-ball method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes