Sparsification of Decomposable Submodular Functions
This addresses scalability issues in machine learning and data mining applications where large decomposable submodular functions are computationally prohibitive, offering an incremental improvement in efficiency.
The paper tackles the problem of processing large-scale decomposable submodular functions, which are sums of many simple submodular functions, by introducing a sparsification method to approximate them with a weighted sum of only a few functions. The main result is a polynomial-time randomized algorithm that achieves an expected number of functions independent of the original count, with empirical validation under constraints like matroid and cardinality.
Submodular functions are at the core of many machine learning and data mining tasks. The underlying submodular functions for many of these tasks are decomposable, i.e., they are sum of several simple submodular functions. In many data intensive applications, however, the number of underlying submodular functions in the original function is so large that we need prohibitively large amount of time to process it and/or it does not even fit in the main memory. To overcome this issue, we introduce the notion of sparsification for decomposable submodular functions whose objective is to obtain an accurate approximation of the original function that is a (weighted) sum of only a few submodular functions. Our main result is a polynomial-time randomized sparsification algorithm such that the expected number of functions used in the output is independent of the number of underlying submodular functions in the original function. We also study the effectiveness of our algorithm under various constraints such as matroid and cardinality constraints. We complement our theoretical analysis with an empirical study of the performance of our algorithm.