Coresets for Data Discretization and Sine Wave Fitting
This provides an efficient solution for real-time monitoring applications like GPS or health tracking, though it is incremental as it extends known coreset techniques to sine wave fitting.
The paper tackles the problem of approximating an unbounded stream of integers with a single sine wave for anomaly detection, proving that a small weighted subset (coreset) of size O(log(N)^{O(1)}) can approximate the cost function with a multiplicative error of 1±ε for any ε>0, leading to streaming algorithms with O((log(N)log(n))^{O(1)}) memory usage.
In the \emph{monitoring} problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+λ(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2π}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $λ$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that \emph{every} set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.