DS LG MLJun 20, 2023

Data Structures for Density Estimation

Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

arXiv:2306.11312v15.96 citationsh-index: 79Has Code

Originality Incremental advance

AI Analysis

This addresses computational efficiency in statistical density estimation for machine learning applications, though it appears incremental as it builds on prior work.

The paper tackles the problem of density estimation by identifying a close distribution from a set using sublinear samples and time, achieving the first data structure with sublinear time in k and an improved algorithm with linear time that reduces operations for given accuracy.

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

View on arXiv PDF Code

Similar