Differentiable Submodular Maximization
This addresses a bottleneck in machine learning applications such as data summarization and feature selection by enabling end-to-end optimization, though it is incremental as it builds on existing greedy maximization algorithms.
The paper tackles the problem of learning submodular functions from data by proposing a method to jointly learn and optimize these functions, rather than treating them separately, and demonstrates effectiveness on synthetic and real-world applications like product recommendation and image summarization.
We consider learning of submodular functions from data. These functions are important in machine learning and have a wide range of applications, e.g. data summarization, feature selection and active learning. Despite their combinatorial nature, submodular functions can be maximized approximately with strong theoretical guarantees in polynomial time. Typically, learning the submodular function and optimization of that function are treated separately, i.e. the function is first learned using a proxy objective and subsequently maximized. In contrast, we show how to perform learning and optimization jointly. By interpreting the output of greedy maximization algorithms as distributions over sequences of items and smoothening these distributions, we obtain a differentiable objective. In this way, we can differentiate through the maximization algorithms and optimize the model to work well with the optimization algorithm. We theoretically characterize the error made by our approach, yielding insights into the tradeoff of smoothness and accuracy. We demonstrate the effectiveness of our approach for jointly learning and optimizing on synthetic maximum cut data, and on real world applications such as product recommendation and image collection summarization.