LGMar 6, 2022

Coresets for Data Discretization and Sine Wave Fitting

MIT
arXiv:2203.03009v17 citationsh-index: 30Has Code
Originality Incremental advance
AI Analysis

This provides an efficient solution for real-time monitoring applications like GPS or health tracking, though it is incremental as it extends known coreset techniques to sine wave fitting.

The paper tackles the problem of approximating an unbounded stream of integers with a single sine wave for anomaly detection, proving that a small weighted subset (coreset) of size O(log(N)^{O(1)}) can approximate the cost function with a multiplicative error of 1±ε for any ε>0, leading to streaming algorithms with O((log(N)log(n))^{O(1)}) memory usage.

In the \emph{monitoring} problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+λ(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2π}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $λ$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that \emph{every} set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes