Bruce Wade

ML
h-index10
4papers
7citations
Novelty69%
AI Score43

4 Papers

MLJul 29, 2025
From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

Agnideep Aich, Ashit Baran Aich, Bruce Wade

The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) stability, these same regions satisfy a local Polyak-Lojasiewicz (PL) condition. We introduce the concept of a Locally Polyak-Lojasiewicz Region (LPLR), where the squared gradient norm lower-bounds the suboptimality gap, prove that properly initialized finite-width networks admit such regions around initialization, and establish that GD achieves linear convergence within an LPLR, providing the first finite-width guarantee that matches empirically observed rates. We validate our theory across diverse settings, from controlled experiments on fully-connected networks to modern ResNet architectures trained with stochastic methods, demonstrating that LPLR structure emerges robustly in practical deep learning scenarios. By rigorously connecting local landscape geometry to fast optimization through the NTK framework, our work provides a definitive theoretical explanation for the remarkable efficiency of gradient-based optimization in deep learning.

MEMay 29, 2025
A2 Copula-Driven Spatial Bayesian Neural Network For Modeling Non-Gaussian Dependence: A Simulation Study

Agnideep Aich, Sameera Hewage, Md Monzur Murshed et al.

In this paper, we introduce the A2 Copula Spatial Bayesian Neural Network (A2-SBNN), a predictive spatial model designed to map coordinates to continuous fields while capturing both typical spatial patterns and extreme dependencies. By embedding the dual-tail novel Archimedean copula viz. A2 directly into the network's weight initialization, A2-SBNN naturally models complex spatial relationships, including rare co-movements in the data. The model is trained through a calibration-driven process combining Wasserstein loss, moment matching, and correlation penalties to refine predictions and manage uncertainty. Simulation results show that A2-SBNN consistently delivers high accuracy across a wide range of dependency strengths, offering a new, effective solution for spatial data modeling beyond traditional Gaussian-based approaches.

MLJul 29, 2025
Measuring Sample Quality with Copula Discrepancies

Agnideep Aich, Ashit Baran Aich, Bruce Wade

The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar's theorem to isolate and quantify the fidelity of a sample's dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall's tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.

MLJul 26, 2025
Bag of Coins: A Statistical Probe into Neural Confidence Structures

Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed et al.

Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier's logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model's top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.