LGOCJul 4, 2022

Approximate Vanishing Ideal Computations at Scale

arXiv:2207.01236v213 citationsh-index: 29
AI Analysis

This work addresses scalability issues for researchers and practitioners using vanishing ideal methods in data analysis, though it is incremental as it builds on existing OAVI with algorithmic improvements.

The paper tackles the computational expense of constructing generators for the approximate vanishing ideal from noisy data, proving that the oracle approximate vanishing ideal algorithm (OAVI) has linear complexity in sample size and proposing modifications that accelerate training time exponentially in features and by orders of magnitude in experiments.

The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied but their computational complexities remain expensive. In this paper, we scale up the oracle approximate vanishing ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. We prove that the computational complexity of OAVI is not superlinear, as previously claimed, but linear in the number of samples $m$. In addition, we propose two modifications that accelerate OAVI's training time: Our analysis reveals that replacing the pairwise conditional gradients algorithm, one of the solvers used in OAVI, with the faster blended pairwise conditional gradients algorithm leads to an exponential speed-up in the number of features $n$. Finally, using a new inverse Hessian boosting approach, intermediate convex optimization problems can be solved almost instantly, improving OAVI's training time by multiple orders of magnitude in a variety of numerical experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes