LGDMMLOct 20, 2020

Very Fast Streaming Submodular Function Maximization

arXiv:2010.10059v510 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient, resource-light data summarization in applications like gamma-ray astronomy, offering a practical improvement over worst-case-focused methods.

The paper tackles the problem of data summarization via submodular function maximization in streaming settings, proposing the ThreeSieves algorithm that selects informative items on-the-fly with fixed memory and provable high-probability performance, outperforming 6 other methods on 8 datasets while using fewer resources.

Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against $6$ other methods on $8$ different datasets with and without concept drift. We show that our algorithm outperforms current state-of-the-art algorithms and, at the same time, uses fewer resources. Last, we highlight a real-world use-case of our algorithm for data summarization in gamma-ray astronomy. We make our code publicly available at https://github.com/sbuschjaeger/SubmodularStreamingMaximization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes