LG MLMay 28, 2019

Distributed estimation of the inverse Hessian by determinantal averaging

arXiv:1905.11546v110.332 citations

Originality Highly original

AI Analysis

This addresses a fundamental issue in distributed Newton's method and related linear algebra tasks, enabling accurate computations like uncertainty quantification, though it is an incremental improvement in distributed optimization methods.

The paper tackles the inversion bias problem in distributed optimization, where averaging local inverse Hessian estimates fails to recover the correct global inverse, and proposes determinantal averaging, which reweights local estimates by the determinant of the local Hessian to achieve asymptotic consistency, recovering the exact Newton step as partitions increase.

In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum. An example of this occurs in distributed Newton's method, where we wish to compute (or implicitly work with) the inverse Hessian multiplied by the gradient. In this case, locally computed estimates are biased, and so taking a uniform average will not recover the correct solution. To address this, we propose determinantal averaging, a new approach for correcting the inversion bias. This approach involves reweighting the local estimates of the Newton's step proportionally to the determinant of the local Hessian estimate, and then averaging them together to obtain an improved global estimate. This method provides the first known distributed Newton step that is asymptotically consistent, i.e., it recovers the exact step in the limit as the number of distributed partitions grows to infinity. To show this, we develop new expectation identities and moment bounds for the determinant and adjugate of a random matrix. Determinantal averaging can be applied not only to Newton's method, but to computing any quantity that is a linear tranformation of a matrix inverse, e.g., taking a trace of the inverse covariance matrix, which is used in data uncertainty quantification.

View on arXiv PDF

Similar