DSNANASep 10, 2018

Leveraging Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices Using $K$-Simplex Numbers

arXiv:1803.064181 citations
AI Analysis

For practitioners working with high-dimensional sparse data, this algorithm enables efficient polynomial feature expansion without memory blowup.

The paper presents an algorithm for polynomial feature expansions that operates directly on CSR matrices, avoiding intermediate densification. It achieves average time complexity Θ(d^K D^K), improving over the standard method by a factor of d^K.

An algorithm is provided for performing polynomial feature expansions that both operates on and produces compressed sparse row (CSR) matrices. Previously, no such algorithm existed, and performing polynomial expansions on CSR matrices required an intermediate densification step. The algorithm performs a $K$-degree expansion by using a bijective function involving $K$-simplex numbers of column indices in the original matrix to column indices in the expanded matrix. Not only is space saved by operating in CSR format, but the bijective function allows for only the nonzero elements to be iterated over and multiplied together during the expansion, greatly improving average time complexity. For a vector of dimensionality $D$ and density $0 \le d \le 1$, the algorithm has average time complexity $Θ(d^KD^K)$ where $K$ is the polynomial-feature order; this is an improvement by a factor $d^K$ over the standard method. This work derives the required function for the cases of $K=2$ and $K=3$ and shows its use in the $K=2$ algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes