Lan V. Truong

ML
h-index1
13papers
60citations
Novelty51%
AI Score43

13 Papers

MLAug 8, 2022
On Rademacher Complexity-based Generalization Bounds for Deep Learning

Lan V. Truong

We show that the Rademacher complexity-based framework can establish non-vacuous generalization bounds for Convolutional Neural Networks (CNNs) in the context of classifying a small set of image classes. A key technical advancement is the formulation of novel contraction lemmas for high-dimensional mappings between vector spaces, specifically designed for general Lipschitz activation functions. These lemmas extend and refine the Talagrand contraction lemma across a broader range of scenarios. Our Rademacher complexity bound provides an enhancement over the results presented by Golowich et al. for ReLU-based Deep Neural Networks (DNNs). Moreover, while previous works utilizing Rademacher complexity have primarily focused on ReLU DNNs, our results generalize to a wider class of activation functions.

MLFeb 11, 2023
Global Convergence Rate of Deep Equilibrium Models with General Activations

Lan V. Truong

In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

LGMay 15, 2022
Generalization Bounds on Multi-Kernel Learning with Mixed Datasets

Lan V. Truong

This paper presents novel generalization bounds for the multi-kernel learning problem. Motivated by applications in sensor networks and spatial-temporal models, we assume that the dataset is mixed where each sample is taken from a finite pool of Markov chains. Our bounds for learning kernels admit $O(\sqrt{\log m})$ dependency on the number of base kernels and $O(1/\sqrt{n})$ dependency on the number of training samples. However, some $O(1/\sqrt{n})$ terms are added to compensate for the dependency among samples compared with existing generalization bounds for multi-kernel learning with i.i.d. datasets.

MLApr 6, 2023
Variable-Complexity Weighted-Tempered Gibbs Samplers for Bayesian Variable Selection

Lan V. Truong

Subset weighted-Tempered Gibbs Sampler (wTGS) has been recently introduced by Jankowiak to reduce the computation complexity per MCMC iteration in high-dimensional applications where the exact calculation of the posterior inclusion probabilities (PIP) is not essential. However, the Rao-Backwellized estimator associated with this sampler has a high variance as the ratio between the signal dimension and the number of conditional PIP estimations is large. In this paper, we design a new subset weighted-Tempered Gibbs Sampler (wTGS) where the expected number of computations of conditional PIPs per MCMC iteration can be much smaller than the signal dimension. Different from the subset wTGS and wTGS, our sampler has a variable complexity per MCMC iteration. We provide an upper bound on the variance of an associated Rao-Blackwellized estimator for this sampler at a finite number of iterations, $T$, and show that the variance is $O\big(\big(\frac{P}{S}\big)^2 \frac{\log T}{T}\big)$ for a given dataset where $S$ is the expected number of conditional PIP computations per MCMC iteration. Experiments show that our Rao-Blackwellized estimator can have a smaller variance than its counterpart associated with the subset wTGS.

LGOct 12, 2022
Generative Adversarial Nets: Can we generate a new dataset based on only one training set?

Lan V. Truong

A generative adversarial network (GAN) is a class of machine learning frameworks designed by Goodfellow et al. in 2014. In the GAN framework, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution. GAN generates new samples from the same distribution as the training set. In this work, we aim to generate a new dataset that has a different distribution from the training set. In addition, the Jensen-Shannon divergence between the distributions of the generative and training datasets can be controlled by some target $δ\in [0, 1]$. Our work is motivated by applications in generating new kinds of rice that have similar characteristics as good rice.

34.9OCMay 5
On the Induced Norms of Matrices and Grothendieck problems

Lan V. Truong, M. H. Duong

We study the induced matrix norm $\|\bA\|_{q \to r}$, whose exact value has been known only in a few classical cases. Determining this norm has long been regarded as difficult due to the highly non-convex nature of its variational definition. Existing works offer numerical estimates or analytic bounds but no exact formula. In this paper we present a purely analytic framework that determines $\|\bA\|_{q \to r}$ exactly for all $q, r \ge 1$ for several classes of important matrices. For these matrices, using a direct connection between the induced norms and Grothendieck problems, our results also simultaneously provide exact values for the later.

75.7LGMar 19
Transformers Learn Robust In-Context Regression under Distributional Uncertainty

Hoang T. H. Cao, Hai D. V. Trinh, Tho Quan et al.

Recent work has shown that Transformers can perform in-context learning for linear regression under restrictive assumptions, including i.i.d. data, Gaussian noise, and Gaussian regression coefficients. However, real-world data often violate these assumptions: the distributions of inputs, noise, and coefficients are typically unknown, non-Gaussian, and may exhibit dependency across the prompt. This raises a fundamental question: can Transformers learn effectively in-context under realistic distributional uncertainty? We study in-context learning for noisy linear regression under a broad range of distributional shifts, including non-Gaussian coefficients, heavy-tailed noise, and non-i.i.d. prompts. We compare Transformers against classical baselines that are optimal or suboptimal under the corresponding maximum-likelihood criteria. Across all settings, Transformers consistently match or outperform these baselines, demonstrating robust in-context adaptation beyond classical estimators.

MLOct 15, 2024
On Rank-Dependent Generalisation Error Bounds for Transformers

Lan V. Truong

In this paper, we introduce various covering number bounds for linear function classes, each subject to different constraints on input and matrix norms. These bounds are contingent on the rank of each class of matrices. We then apply these bounds to derive generalization errors for single layer transformers. Our results improve upon several existing generalization bounds in the literature and are independent of input sequence length, highlighting the advantages of employing low-rank matrices in transformer design. More specifically, our achieved generalisation error bound decays as $O(1/\sqrt{n})$ where $n$ is the sample length, which improves existing results in research literature of the order $O((\log n)/(\sqrt{n}))$. It also decays as $O(\log r_w)$ where $r_w$ is the rank of the combination of query and and key matrices.

LGMay 21, 2025
Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima

Lan V. Truong

We study the problem of best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, with a particular focus on instances that admit multiple optimal arms. While the Track-and-Stop algorithm of Garivier and Kaufmann (2016) is widely conjectured to be instance-optimal, its performance in the presence of multiple optima has remained insufficiently understood. In this work, we revisit the Track-and-Stop strategy and propose a modified stopping rule that ensures instance-optimality even when the set of optimal arms is not a singleton. Our analysis introduces a new information-theoretic lower bound that explicitly accounts for multiple optimal arms, and we demonstrate that our stopping rule tightly matches this bound.

MLDec 23, 2021
Generalization Error Bounds on Deep Learning with Markov Datasets

Lan V. Truong

In this paper, we derive upper bounds on generalization errors for deep neural networks with Markov datasets. These bounds are developed based on Koltchinskii and Panchenko's approach for bounding the generalization error of combined classifiers with i.i.d. datasets. The development of new symmetrization inequalities in high-dimensional probability for Markov chains is a key element in our extension, where the spectral gap of the infinitesimal generator of the Markov chain plays a key parameter in these inequalities. We also propose a simple method to convert these bounds and other similar ones in traditional deep learning and machine learning to Bayesian counterparts for both i.i.d. and Markov datasets. Extensions to $m$-order homogeneous Markov chains such as AR and ARMA models and mixtures of several Markov data services are given.

ITJan 27, 2021
Fundamental limits and algorithms for sparse linear regression with sublinear sparsity

Lan V. Truong

We establish exact asymptotic expressions for the normalized mutual information and minimum mean-square-error (MMSE) of sparse linear regression in the sub-linear sparsity regime. Our result is achieved by a generalization of the adaptive interpolation method in Bayesian inference for linear regimes to sub-linear ones. A modification of the well-known approximate message passing algorithm to approach the MMSE fundamental limit is also proposed, and its state evolution is rigorously analyzed. Our results show that the traditional linear assumption between the signal dimension and number of observations in the replica and adaptive interpolation methods is not necessary for sparse signals. They also show how to modify the existing well-known AMP algorithms for linear regimes to sub-linear ones.

ITSep 28, 2020
Replica Analysis of the Linear Model with Markov or Hidden Markov Signal Priors

Lan V. Truong

This paper estimates free energy, average mutual information, and minimum mean square error (MMSE) of a linear model under two assumptions: (1) the source is generated by a Markov chain, (2) the source is generated via a hidden Markov model. Our estimates are based on the replica method in statistical physics. We show that under the posterior mean estimator, the linear model with Markov sources or hidden Markov sources is decoupled into single-input AWGN channels with state information available at both encoder and decoder where the state distribution follows the left Perron-Frobenius eigenvector with unit Manhattan norm of the stochastic matrix of Markov chains. Numerical results show that the free energies and MSEs obtained via the replica method are closely approximate to their counterparts achieved by the Metropolis-Hastings algorithm or some well-known approximate message passing algorithms in the research literature.

ITJan 30, 2019
Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

Lan V. Truong, Jonathan Scarlett

The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional information content of log-concave random variables, which may be of independent interest.