DSLGMLJun 20, 2023

Data Structures for Density Estimation

arXiv:2306.11312v16 citationsh-index: 79Has Code
Originality Incremental advance
AI Analysis

This addresses computational efficiency in statistical density estimation for machine learning applications, though it appears incremental as it builds on prior work.

The paper tackles the problem of density estimation by identifying a close distribution from a set using sublinear samples and time, achieving the first data structure with sublinear time in k and an improved algorithm with linear time that reduces operations for given accuracy.

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes