Expectation-Maximization for Learning Determinantal Point Processes
This work addresses a fundamental bottleneck in DPP modeling for tasks like recommendation systems, offering a more flexible learning approach, though it is incremental in improving optimization for an existing model class.
The paper tackles the problem of learning the full kernel matrix for determinantal point processes (DPPs), which is non-convex and NP-hard, by proposing a novel algorithm that reparameterizes the kernel and uses expectation-maximization to optimize likelihood. The method achieves up to 16.5% relative gains in test log-likelihood on a product recommendation task compared to a naive gradient ascent approach.
A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the log-likelihood of the available data. However, log-likelihood is non-convex in the entries of the kernel matrix, and this learning problem is conjectured to be NP-hard. Thus, previous work has instead focused on more restricted convex learning settings: learning only a single weight for each row of the kernel matrix, or learning weights for a linear combination of DPPs with fixed kernel matrices. In this work we propose a novel algorithm for learning the full kernel matrix. By changing the kernel parameterization from matrix entries to eigenvalues and eigenvectors, and then lower-bounding the likelihood in the manner of expectation-maximization algorithms, we obtain an effective optimization procedure. We test our method on a real-world product recommendation task, and achieve relative gains of up to 16.5% in test log-likelihood compared to the naive approach of maximizing likelihood by projected gradient ascent on the entries of the kernel matrix.