Learning Linear Models Using Distributed Iterative Hessian Sketching
This work addresses scalability issues in system identification for researchers and practitioners dealing with large-scale linear models, though it is incremental as it builds on existing sketching techniques.
The paper tackles the challenge of learning Markov parameters for linear systems from large datasets, which leads to optimization problems with too many variables for standard second-order methods. It demonstrates that a distributed Newton algorithm using Hessian sketching achieves ε-optimal solutions with geometric convergence and is easily parallelizable.
This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and multi-rollout setting. In both instances, the number of samples required in order to obtain acceptable estimates can produce optimization problems with an intractably large number of decision variables for a second-order algorithm. We show that a randomized and distributed Newton algorithm based on Hessian-sketching can produce $ε$-optimal solutions and converges geometrically. Moreover, the algorithm is trivially parallelizable. Our results hold for a variety of sketching matrices and we illustrate the theory with numerical examples.