DCDSLGMar 15, 2024

GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

arXiv:2403.10332v3h-index: 34Sea
Originality Incremental advance
AI Analysis

This work addresses memory limitations in distributed submodular optimization for applications like data summarization and machine learning, representing an incremental improvement over existing methods.

The paper tackles the problem of maximizing constrained submodular functions on massive datasets by proposing GreedyML, a parallel algorithm that uses multiple accumulation steps to reduce memory bottlenecks, enabling it to handle millions of elements where prior methods fail due to memory constraints.

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical contexts such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedi algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing \emph{a single} accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step exceeds the memory available on a processor, and the processor that performs the accumulation becomes a computational bottleneck. Hence we propose a generalization of the RandGreedi algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We evaluate the new GreedyML algorithm on three classes of problems, and report results from large-scale data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedi algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm is faster than the RandGreedi algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedi algorithm on these problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes