NYTRO: When Subsampling Meets Early Stopping
This work addresses computational efficiency issues for practitioners in machine learning dealing with large datasets, though it appears incremental as it builds on existing early stopping and subsampling techniques.
The paper tackles the problem of combining early stopping and subsampling to address both time and memory constraints in large-scale learning, proposing a randomized iterative regularization method for least squares regression and validating it with theoretical and experimental analysis.
Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines. On the other hand, memory/space (rather than time) complexity is the main constraint in many applications, and randomized subsampling techniques have been proposed to tackle this issue. In this paper we ask whether early stopping and subsampling ideas can be combined in a fruitful way. We consider the question in a least squares regression setting and propose a form of randomized iterative regularization based on early stopping and subsampling. In this context, we analyze the statistical and computational properties of the proposed method. Theoretical results are complemented and validated by a thorough experimental analysis.