ST MLNov 7, 2015

Signed Support Recovery for Single Index Models in High-Dimensions

arXiv:1511.02270v23.311 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently identifying relevant variables in high-dimensional statistical models for researchers and practitioners in machine learning and statistics, though it is incremental as it builds on existing methods to establish optimality.

The paper tackles the support recovery problem for high-dimensional single index models, showing that two computationally inexpensive algorithms (DT-SIR and an SDP approach) can optimally recover the support of a sparse vector with a critical sample size threshold, specifically when the rescaled sample size κ exceeds a certain value, and fail otherwise with probability at least 1/2 asymptotically.

In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by Lin et al. (2015); and (b) a semi-definite programming (SDP) approach inspired by Amini & Wainwright (2008). When $s=O(p^{1-δ})$ for some $δ>0$, we demonstrate that both procedures can succeed in recovering the support of $\boldsymbolβ$ as long as the rescaled sample size $κ=\frac{n}{s\log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $κ$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\boldsymbolβ$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations.

View on arXiv PDF

Similar