Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs
This provides a general and efficient method for gradient-sparse estimation, applicable to models like linear and generalized linear models, with broader guarantees than previous polynomial-time algorithms.
The paper tackles the problem of estimating gradient-sparse parameters on graphs, proposing a projected gradient descent algorithm that achieves squared-error risk proportional to (s*/n) log(1 + p/s*), independent of the graph structure.
We study estimation of a gradient-sparse parameter vector $\boldsymbolθ^* \in \mathbb{R}^p$, having strong gradient-sparsity $s^*:=\|\nabla_G \boldsymbolθ^*\|_0$ on an underlying graph $G$. Given observations $Z_1,\ldots,Z_n$ and a smooth, convex loss function $\mathcal{L}$ for which $\boldsymbolθ^*$ minimizes the population risk $\mathbb{E}[\mathcal{L}(\boldsymbolθ;Z_1,\ldots,Z_n)]$, we propose to estimate $\boldsymbolθ^*$ by a projected gradient descent algorithm that iteratively and approximately projects gradient steps onto spaces of vectors having small gradient-sparsity over low-degree spanning trees of $G$. We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk $\frac{s^*}{n} \log (1+\frac{p}{s^*})$ up to a multiplicative constant that is independent of $G$. In contrast, previous polynomial-time algorithms have only been shown to achieve this guarantee in more specialized settings, or under additional assumptions for $G$ and/or the sparsity pattern of $\nabla_G \boldsymbolθ^*$. As applications of our general framework, we apply our results to the examples of linear models and generalized linear models with random design.