On Data Preconditioning for Regularized Loss Minimization
This work addresses a bottleneck in big data problems for machine learning practitioners, but it is incremental as it builds on a well-known technique.
The paper tackles the problem of slow convergence in first-order optimization methods for regularized loss minimization due to ill-conditioning, by providing a theoretical analysis of data preconditioning to reduce the condition number and boost convergence, with preliminary experiments validating the theory.
In this work, we study data preconditioning, a well-known and long-existing technique, for boosting the convergence of first-order methods for regularized loss minimization. It is well understood that the condition number of the problem, i.e., the ratio of the Lipschitz constant to the strong convexity modulus, has a harsh effect on the convergence of the first-order optimization methods. Therefore, minimizing a small regularized loss for achieving good generalization performance, yielding an ill conditioned problem, becomes the bottleneck for big data problems. We provide a theory on data preconditioning for regularized loss minimization. In particular, our analysis exhibits an appropriate data preconditioner and characterizes the conditions on the loss function and on the data under which data preconditioning can reduce the condition number and therefore boost the convergence for minimizing the regularized loss. To make the data preconditioning practically useful, we endeavor to employ and analyze a random sampling approach to efficiently compute the preconditioned data. The preliminary experiments validate our theory.