Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications
This work addresses a practical bottleneck in model selection for machine learning practitioners, offering an incremental improvement over existing methods.
The authors tackled the problem of efficiently updating machine learning models when small changes are made to training data, such as in cross-validation or feature selection, by introducing the Generalized Low-Rank Update (GLRU) method, which extends low-rank updates from linear estimators to methods like SVM and logistic regression, achieving computational complexity proportional to the dataset changes.
In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.