86.4CGApr 24
The Prophet and the Voronoi DiagramSariel Har-Peled
Consider a stream of $n$ random points (say, from the unit square) arriving one by one, where a player has to make an irreversible immediate decision for each arriving point whether to pick it. The player has to pick a single point, and the payoff is the area of the cell of the picked point, in the final Voronoi diagram of \emph{all} the points. We show that there is a simple strategy so that with probability $\geq 1 - \tilde O(1/\sqrt{n})$, the player's payoff is only a constant factor smaller than the optimal choice (i.e., the one made by the prophet). This competitiveness is somewhat surprising, as this payoff is larger by a factor of $Θ( \log n)$ than the average payoff.
CGFeb 6
Graph-Based Nearest-Neighbor Search without the SpreadJeff Giliberti, Sariel Har-Peled, Jonas Sauer et al.
$\renewcommand{\Re}{\mathbb{R}}$Recent work showed how to construct nearest-neighbor graphs of linear size, on a given set $P$ of $n$ points in $\Re^d$, such that one can answer approximate nearest-neighbor queries in logarithmic time in the spread. Unfortunately, the spread might be unbounded in $n$, and an interesting theoretical question is how to remove the dependency on the spread. Here, we show how to construct an external linear-size data structure that, combined with the linear-size graph, allows us to answer ANN queries in logarithmic time in $n$.
59.8CGApr 5
Separator for $c$-Packed Segments and CurvesSariel Har-Peled
We provide a simple algorithm for computing a balanced separator for a set of segments that is $c$-packed, showing that the separator cuts only $O(c)$ segments. While the result was known before, arguably our proof is simpler.
DSJan 26, 2021
Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?Martin Aumüller, Sariel Har-Peled, Sepideh Mahabadi et al.
Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance $r$ from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. In this work, we show that LSH based algorithms can be made fair, without a significant loss in efficiency. We propose several efficient data structures for the exact and approximate variants of the fair NN problem. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. We also develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights the inherent unfairness of NN data structures and shows the performance of our algorithms on real-world datasets.
LGJun 6, 2019
Near Neighbor: Who is the Fairest of Them All?Sariel Har-Peled, Sepideh Mahabadi
$\newcommand{\ball}{\mathbb{B}}\newcommand{\dsQ}{\mathcal{Q}}\newcommand{\dsS}{\mathcal{S}}$In this work we study a fair variant of the near neighbor problem. Namely, given a set of $n$ points $P$ and a parameter $r$, the goal is to preprocess the points, such that given a query point $q$, any point in the $r$-neighborhood of the query, i.e., $\ball(q,r)$, have the same probability of being reported as the near neighbor. We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the $r$-neighborhood of a query $q$ with almost uniform probability. The query time is proportional to $O\bigl( \mathrm{dns}(q.r) \dsQ(n,c) \bigr)$, and its space is $O(\dsS(n,c))$, where $\dsQ(n,c)$ and $\dsS(n,c)$ are the query time and space of an LSH algorithm for $c$-approximate near neighbor, and $\mathrm{dns}(q,r)$ is a function of the local density around $q$. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data.
CGJul 9, 2015
Sparse Approximation via Generating Point SetsAvrim Blum, Sariel Har-Peled, Benjamin Raichel
$ \newcommand{\kalg}{k_{\mathrm{alg}}} \newcommand{\kopt}{k_{\mathrm{opt}}} \newcommand{\algset}{T} \renewcommand{\Re}{\mathbb{R}} \newcommand{\eps}{\varepsilon} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\npoints}{n} \newcommand{\ballD}{\mathsf{b}} \newcommand{\dataset}{P} $ For a set $\dataset$ of $\npoints$ points in the unit ball $\ballD \subseteq \Re^d$, consider the problem of finding a small subset $\algset \subseteq \dataset$ such that its convex-hull $\eps$-approximates the convex-hull of the original set. We present an efficient algorithm to compute such a $\eps'$-approximation of size $\kalg$, where $\eps'$ is function of $\eps$, and $\kalg$ is a function of the minimum size $\kopt$ of such an $\eps$-approximation. Surprisingly, there is no dependency on the dimension $d$ in both bounds. Furthermore, every point of $\dataset$ can be $\eps$-approximated by a convex-combination of points of $\algset$ that is $O(1/\eps^2)$-sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset $\algset$ of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.
LGJul 6, 2015
A Simple Algorithm for Maximum Margin Classification, RevisitedSariel Har-Peled
In this note, we revisit the algorithm of Har-Peled et. al. [HRZ07] for computing a linear maximum margin classifier. Our presentation is self contained, and the algorithm itself is slightly simpler than the original algorithm. The algorithm itself is a simple Perceptron like iterative algorithm. For more details and background, the reader is referred to the original paper.