CGApr 24
Counting All Lattice Rectangles in the Square Grid in Near-Linear TimeDmitry Babichev, Sergey Babichev
We study the exact counting problem for all lattice rectangles contained in the square $[0,n)\times[0,n)$, including non-axis-parallel ones. Starting from the standard parametrization by a primitive direction $(u,v)$ and two side lengths, we derive a sequence of exact algorithms of complexity $O(n^2)$, $O(n^{3/2}\log n)$, $O(n^{4/3}\log n)$, and finally $O(n\log^3 n)$. The main idea behind the near-linear algorithm is to reduce the geometric summation to a constant-size family of weighted floor sums closed under Euclidean-style affine and reciprocal transformations, and hence evaluable in $O(\log n)$ time per query. Besides the exact algorithmic result, we also derive a two-term asymptotic expansion, $F(n)=\frac{4\log 2-1}{π^2}n^4\log n+B\,n^4+o(n^4)$ with the explicit formula for $B$, which provides an independent consistency check for the large-$n$ numerical data produced by the algorithms.
MLFeb 11, 2019
Efficient Primal-Dual Algorithms for Large-Scale Multiclass ClassificationDmitry Babichev, Dmitrii Ostrovskii, Francis Bach
We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$. Our focus is on a special class of losses that includes, in particular, the multiclass hinge and logistic losses. Our approach combines several ideas: (i) passing to the equivalent saddle-point problem with a quasi-bilinear objective; (ii) applying stochastic mirror descent with a proper choice of geometry which guarantees a favorable accuracy bound; (iii) devising non-uniform sampling schemes to approximate the matrix products. In particular, for the multiclass hinge loss we propose a \textit{sublinear} algorithm with iterations performed in $O(d+n+k)$ arithmetic operations.
MLApr 16, 2018
Constant Step Size Stochastic Gradient Descent for Probabilistic ModelingDmitry Babichev, Francis Bach
Stochastic gradient methods enable learning probabilistic models from large amounts of data. While large step-sizes (learning rates) have shown to be best for least-squares (e.g., Gaussian noise) once combined with parameter averaging, these are not leading to convergent algorithms in general. In this paper, we consider generalized linear models, that is, conditional models based on exponential families. We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent. For finite-dimensional models, we show that this can sometimes (and surprisingly) lead to better predictions than the best linear model. For infinite-dimensional models, we show that it always converges to optimal predictions, while averaging natural parameters never does. We illustrate our findings with simulations on synthetic data and classical benchmarks with many observations.