COLGMLFeb 23, 2018

Optimized Algorithms to Sample Determinantal Point Processes

arXiv:1802.08471v16 citations
Originality Incremental advance
AI Analysis

This work addresses a bottleneck for researchers and practitioners using DPPs for diverse sampling in machine learning and statistics, though it is incremental as it optimizes an existing method.

The paper tackles the computational inefficiency in sampling from Determinantal Point Processes (DPPs), specifically reducing the cost of the orthogonalization step from O(Nμ^3) to O(Nμ^2) with a simpler algorithm and a memory-efficient variant.

In this technical report, we discuss several sampling algorithms for Determinantal Point Processes (DPP). DPPs have recently gained a broad interest in the machine learning and statistics literature as random point processes with negative correlation, i.e., ones that can generate a "diverse" sample from a set of items. They are parametrized by a matrix $\mathbf{L}$, called $L$-ensemble, that encodes the correlations between items. The standard sampling algorithm is separated in three phases: 1/~eigendecomposition of $\mathbf{L}$, 2/~an eigenvector sampling phase where $\mathbf{L}$'s eigenvectors are sampled independently via a Bernoulli variable parametrized by their associated eigenvalue, 3/~a Gram-Schmidt-type orthogonalisation procedure of the sampled eigenvectors. In a naive implementation, the computational cost of the third step is on average $\mathcal{O}(Nμ^3)$ where $μ$ is the average number of samples of the DPP. We give an algorithm which runs in $\mathcal{O}(Nμ^2)$ and is extremely simple to implement. If memory is a constraint, we also describe a dual variant with reduced memory costs. In addition, we discuss implementation details often missing in the literature.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes