Partial minimization of strict convex functions and tensor scaling
Provides a unifying convergence analysis for tensor scaling algorithms, which is relevant for optimization and machine learning practitioners working with matrix/tensor scaling problems.
The paper shows that Sinkhorn-type scaling algorithms for matrices and tensors can be interpreted as partial minimization of log-convex functions, and proves geometric convergence for a general class of strict convex functions under partial minimization.
Assume that f is a strict convex function with a unique minimum in R^n. We divide the vector of n-variables to d groups of vector subvariables with d at least two. We assume that we can find the partial minimum of f with respect to each vector subvariable while other variables are fixed. We then describe an algorithm that partially minimizes each time on a specifically chosen vector subvariable. This algorithm converges geometrically to the unique minimum. The rate of convergence depends on the uniform bounds on the eigenvalues of the Hessian of f in the compact sublevel set f whose values are at most f(x_0), where x_0 is the starting point of the algorithm. In the case where f is a polynomial of degree two, with positive definite quadratic term, and d=n our method can be considered as a generalization of the classical conjugate gradient method. The main result of this paper is the observation that the celebrated Sinkhorn diagonal scaling algorithm for matrices, and the corresponding diagonal scaling of tensors, can be viewed as partial minimization of certain logconvex functions.