Streaming Methods for Restricted Strongly Convex Functions with Applications to Prototype Selection
This provides a positive result for efficient data summarization in large datasets, addressing a gap highlighted by prior negative results, though it is incremental as it builds on known conditions for weakly submodular functions.
The paper tackles the problem of streaming optimization for weakly submodular functions by showing that a constant factor approximation guarantee is possible under restricted strong convexity and smoothness conditions, and applies this to prototype selection, achieving orders of magnitude faster speed than state-of-the-art methods while maintaining solution quality.
In this paper, we show that if the optimization function is restricted-strongly-convex (RSC) and restricted-smooth (RSM) -- a rich subclass of weakly submodular functions -- then a streaming algorithm with constant factor approximation guarantee is possible. More generally, our results are applicable to any monotone weakly submodular function with submodularity ratio bounded from above. This (positive) result which provides a sufficient condition for having a constant factor streaming guarantee for weakly submodular functions may be of special interest given the recent negative result (Elenberg et al., 2017) for the general class of weakly submodular functions. We apply our streaming algorithms for creating compact synopsis of large complex datasets, by selecting $m$ representative elements, by optimizing a suitable RSC and RSM objective function. Above results hold even with additional constraints such as learning non-negative weights, for interpretability, for each selected element indicative of its importance. We empirically evaluate our algorithms on two real datasets: MNIST- a handwritten digits dataset and Letters- a UCI dataset containing the alphabet written in different fonts and styles. We observe that our algorithms are orders of magnitude faster than the state-of-the-art streaming algorithm for weakly submodular functions and with our main algorithm still providing equally good solutions in practice.