Tomasz Nowicki

h-index17

8papers

35citations

Novelty49%

AI Score37

Ranked #89,814 of 194,257 authors (top 46%)#19,906 in LG (top 50%)

8 Papers

3.8MLOct 20, 2022

On Representations of Mean-Field Variational Inference

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki et al.

The mean field variational inference (MFVI) formulation restricts the general Bayesian inference problem to the subspace of product measures. We present a framework to analyze MFVI algorithms, which is inspired by a similar development for general variational Bayesian formulations. Our approach enables the MFVI problem to be represented in three different manners: a gradient flow on Wasserstein space, a system of Fokker-Planck-like equations and a diffusion process. Rigorous guarantees are established to show that a time-discretized implementation of the coordinate ascent variational inference algorithm in the product Wasserstein space of measures yields a gradient flow in the limit. A similar result is obtained for their associated densities, with the limit being given by a quasi-linear partial differential equation. A popular class of practical algorithms falls in this framework, which provides tools to establish convergence. We hope this framework could be used to guarantee convergence of algorithms in a variety of approaches, old and new, to solve variational inference problems.

4.1LGOct 21, 2025

Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

Yanna Ding, Songtao Lu, Yingdong Lu et al.

Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d. inputs. To understand how transformers express ICL when modeling dynamics-driven functions, we investigate Markovian function learning through a structured ICL setup, where we characterize the loss landscape to reveal underlying optimization behaviors. Specifically, we (1) provide the closed-form expression of the global minimizer (in an enlarged parameter space) for a single-layer linear self-attention (LSA) model; (2) prove that recovering transformer parameters that realize the optimal solution is NP-hard in general, revealing a fundamental limitation of one-layer LSA in representing structured dynamical functions; and (3) supply a novel interpretation of a multilayer LSA as performing preconditioned gradient descent to optimize multiple objectives beyond the square loss. These theoretical results are numerically validated using simplified transformers.

4.1LGSep 22, 2025

Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-based Matrix Inversion

Anton Lebedev, Won Kyung Lee, Soumyadip Ghosh et al.

Large, sparse linear systems are pervasive in modern science and engineering, and Krylov subspace solvers are an established means of solving them. Yet convergence can be slow for ill-conditioned matrices, so practical deployments usually require preconditioners. Markov chain Monte Carlo (MCMC)-based matrix inversion can generate such preconditioners and accelerate Krylov iterations, but its effectiveness depends on parameters whose optima vary across matrices; manual or grid search is costly. We present an AI-driven framework recommending MCMC parameters for a given linear system. A graph neural surrogate predicts preconditioning speed from $A$ and MCMC parameters. A Bayesian acquisition function then chooses the parameter sets most likely to minimise iterations. On a previously unseen ill-conditioned system, the framework achieves better preconditioning with 50\% of the search budget of conventional methods, yielding about a 10\% reduction in iterations to convergence. These results suggest a route for incorporating MCMC-based preconditioners into large-scale systems.

1.2FAFeb 4, 2022

Polynomial convergence of iterations of certain random operators in Hilbert space

Soumyadip Ghosh, Yingdong Lu, Tomasz J. Nowicki

We study the convergence of a random iterative sequence of a family of operators on infinite dimensional Hilbert spaces, inspired by the Stochastic Gradient Descent (SGD) algorithm in the case of the noiseless regression, as studied in [1]. We identify conditions that are strictly broader than previously known for polynomial convergence rate in various norms, and characterize the roles the randomness plays in determining the best multiplicative constants. Additionally, we prove almost sure convergence of the sequence.

5.8LGJan 31, 2022

Neural Network Training with Asymmetric Crosspoint Elements

Murat Onen, Tayfun Gokmen, Teodor K. Todorov et al.

Analog crossbar arrays comprising programmable nonvolatile resistors are under intense investigation for acceleration of deep neural network training. However, the ubiquitous asymmetric conductance modulation of practical resistive devices critically degrades the classification performance of networks trained with conventional algorithms. Here, we describe and experimentally demonstrate an alternative fully-parallel training algorithm: Stochastic Hamiltonian Descent. Instead of conventionally tuning weights in the direction of the error function gradient, this method programs the network parameters to successfully minimize the total energy (Hamiltonian) of the system that incorporates the effects of device asymmetry. We provide critical intuition on why device asymmetry is fundamentally incompatible with conventional training algorithms and how the new approach exploits it as a useful feature instead. Our technique enables immediate realization of analog deep learning accelerators based on readily available device technologies.

5.0MLOct 21, 2021

Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new analytic and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version that we call the Alternating Direction HMC (AD-HMC). Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. Numerical experiments suggest that AD-HMC can show improved performance over HMC with Gaussian auxiliaries.

2.3COFeb 4, 2021

HMC, an Algorithms in Data Mining, the Functional Analysis approach

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the Dynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.

3.3CAJan 21, 2021

On $L^q$ Convergence of the Hamiltonian Monte Carlo

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

We establish $L_q$ convergence for Hamiltonian Monte Carlo algorithms. More specifically, under mild conditions for the associated Hamiltonian motion, we show that the outputs of the algorithms converge (strongly for $2\le q<\infty$ and weakly for $1<q<2$) to the desired target distribution.