Maksym Byshkin

6.7MLMay 29, 2020Code

CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing

Oleksandr Borysenko, Maksym Byshkin

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum -- a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.

1.2COJan 2, 2019Code

A Simple Algorithm for Scalable Monte Carlo Inference

Alexander Borisenko, Maksym Byshkin, Alessandro Lomi

The methods of statistical physics are widely used for modelling complex networks. Building on the recently proposed Equilibrium Expectation approach, we derive a simple and efficient algorithm for maximum likelihood estimation (MLE) of parameters of exponential family distributions - a family of statistical models, that includes Ising model, Markov Random Field and Exponential Random Graph models. Computational experiments and analysis of empirical data demonstrate that the algorithm increases by orders of magnitude the size of network data amenable to Monte Carlo based inference. We report results suggesting that the applicability of the algorithm may readily be extended to the analysis of large samples of dependent observations commonly found in biology, sociology, astrophysics, and ecology.

Maksym Byshkin

2 Papers