Linear Learning with Sparse Data
This work addresses efficiency improvements for linear learning in sparse data settings, which is incremental.
The paper tackles the problem of training linear predictors efficiently on high-dimensional sparse data by presenting an efficient implementation of Averaged Stochastic Gradient Descent (ASGD) that avoids dense vector operations, and introduces a translation invariant extension called Centered Averaged Stochastic Gradient Descent (CASGD).
Linear predictors are especially useful when the data is high-dimensional and sparse. One of the standard techniques used to train a linear predictor is the Averaged Stochastic Gradient Descent (ASGD) algorithm. We present an efficient implementation of ASGD that avoids dense vector operations. We also describe a translation invariant extension called Centered Averaged Stochastic Gradient Descent (CASGD).