DS CC NA NAJan 19, 2017

Efficient Implementation Of Newton-Raphson Methods For Sequential Data Prediction

arXiv:1701.053783 citationsh-index: 29

AI Analysis

For practitioners dealing with high-dimensional sequential data, this work removes the computational barrier to using second-order optimization, enabling faster convergence without quadratic cost.

This paper introduces an efficient implementation of Newton-Raphson methods for sequential data prediction that reduces computational complexity from O(M^2) to O(M), matching first-order methods while retaining second-order convergence benefits. The algorithm achieves identical mean square error performance to regular Newton-Raphson and is demonstrated on real-life big datasets.

We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data predictor much faster compared to the first order algorithms, e.g., Online Gradient Descent. However, implementation of these methods is not usually feasible in big data applications because of the extremely high computational needs. Regular implementation of the Newton-Raphson Methods requires a computational complexity in the order of $O(M^2)$ for an $M$ dimensional feature vector, while the first order algorithms need only $O(M)$. To this end, in order to eliminate this gap, we introduce a highly efficient implementation reducing the computational complexity of the Newton-Raphson Methods from quadratic to linear scale. The presented algorithm provides the well-known merits of the second order methods while offering the computational complexity of $O(M)$. We utilize the shifted nature of the consecutive feature vectors and do not rely on any statistical assumptions. Therefore, both regular and fast implementations achieve the same performance in the sense of mean square error. We demonstrate the computational efficiency of our algorithm on real life sequential big datasets. We also illustrate that the presented algorithm is numerically stable.

View on arXiv PDF

Similar