MLMay 29, 2025
Deep Copula Classifier: Theory, Consistency, and Empirical EvaluationAgnideep Aich, Ashit Baran Aich
We present the Deep Copula Classifier (DCC), a class-conditional generative model that separates marginal estimation from dependence modeling using neural copula densities. DCC is interpretable, Bayes-consistent, and achieves excess-risk $O(n^{-r/(2r+d)})$ for $r$-smooth copulas. In a controlled two-class study with strong dependence ($|ρ|=0.995$), DCC learns Bayes-aligned decision regions. With oracle or pooled marginals, it nearly reaches the best possible performance (accuracy $\approx 0.971$; ROC-AUC $\approx 0.998$). As expected, per-class KDE marginals perform less well (accuracy $0.873$; ROC-AUC $0.957$; PR-AUC $0.966$). On the Pima Indians Diabetes dataset, calibrated DCC ($τ=1$) achieves accuracy $0.879$, ROC-AUC $0.936$, and PR-AUC $0.870$, outperforming Logistic Regression, SVM (RBF), and Naive Bayes, and matching Logistic Regression on the lowest Expected Calibration Error (ECE). Random Forest is also competitive (accuracy $0.892$; ROC-AUC $0.933$; PR-AUC $0.880$). Directly modeling feature dependence yields strong, well-calibrated performance with a clear probabilistic interpretation, making DCC a practical, theoretically grounded alternative to independence-based classifiers.
MLJul 29, 2025
From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz RegionsAgnideep Aich, Ashit Baran Aich, Bruce Wade
The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) stability, these same regions satisfy a local Polyak-Lojasiewicz (PL) condition. We introduce the concept of a Locally Polyak-Lojasiewicz Region (LPLR), where the squared gradient norm lower-bounds the suboptimality gap, prove that properly initialized finite-width networks admit such regions around initialization, and establish that GD achieves linear convergence within an LPLR, providing the first finite-width guarantee that matches empirically observed rates. We validate our theory across diverse settings, from controlled experiments on fully-connected networks to modern ResNet architectures trained with stochastic methods, demonstrating that LPLR structure emerges robustly in practical deep learning scenarios. By rigorously connecting local landscape geometry to fast optimization through the NTK framework, our work provides a definitive theoretical explanation for the remarkable efficiency of gradient-based optimization in deep learning.
MEMay 29, 2025
A2 Copula-Driven Spatial Bayesian Neural Network For Modeling Non-Gaussian Dependence: A Simulation StudyAgnideep Aich, Sameera Hewage, Md Monzur Murshed et al.
In this paper, we introduce the A2 Copula Spatial Bayesian Neural Network (A2-SBNN), a predictive spatial model designed to map coordinates to continuous fields while capturing both typical spatial patterns and extreme dependencies. By embedding the dual-tail novel Archimedean copula viz. A2 directly into the network's weight initialization, A2-SBNN naturally models complex spatial relationships, including rare co-movements in the data. The model is trained through a calibration-driven process combining Wasserstein loss, moment matching, and correlation penalties to refine predictions and manage uncertainty. Simulation results show that A2-SBNN consistently delivers high accuracy across a wide range of dependency strengths, offering a new, effective solution for spatial data modeling beyond traditional Gaussian-based approaches.
MLOct 28, 2025
Copula-Stein Discrepancy: A Generator-Based Stein Operator for Archimedean DependenceAgnideep Aich, Ashit Baran Aich
Kernel Stein discrepancies (KSDs) have become a principal tool for goodness-of-fit testing, but standard KSDs are often insensitive to higher-order dependency structures, such as tail dependence, which are critical in many scientific and financial domains. We address this gap by introducing the Copula-Stein Discrepancy (CSD), a novel class of discrepancies tailored to the geometry of statistical dependence. By defining a Stein operator directly on the copula density, CSD leverages the generative structure of dependence, rather than relying on the joint density's score function. For the broad class of Archimedean copulas, this approach yields a closed-form Stein kernel derived from the scalar generator function. We provide a comprehensive theoretical analysis, proving that CSD (i) metrizes weak convergence of copula distributions, ensuring it detects any mismatch in dependence; (ii) has an empirical estimator that converges at the minimax optimal rate of $O_P(n^{-1/2})$; and (iii) is provably sensitive to differences in tail dependence coefficients. The framework is extended to general non-Archimedean copulas, including elliptical and vine copulas. Computationally, the exact CSD kernel evaluation scales linearly in dimension, while a novel random feature approximation reduces the $n$-dependence from quadratic $O(n^2)$ to near-linear $\tilde{O}(n)$, making CSD a practical and theoretically principled tool for dependence-aware inference.
MLOct 16, 2025
The Minimax Lower Bound of Kernel Stein Discrepancy EstimationJose Cribeiro-Ramallo, Agnideep Aich, Florian Kalinke et al.
Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $\sqrt n$-convergence. In this work, we present two complementary results (with different proof strategies), establishing that the minimax lower bound of KSD estimation is $n^{-1/2}$ and settling the optimality of these estimators. Our first result focuses on KSD estimation on $\mathbb R^d$ with the Langevin-Stein operator; our explicit constant for the Gaussian kernel indicates that the difficulty of KSD estimation may increase exponentially with the dimensionality $d$. Our second result settles the minimax lower bound for KSD estimation on general domains.
MLJul 29, 2025
Measuring Sample Quality with Copula DiscrepanciesAgnideep Aich, Ashit Baran Aich, Bruce Wade
The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar's theorem to isolate and quantify the fidelity of a sample's dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall's tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.
MLJul 26, 2025
Bag of Coins: A Statistical Probe into Neural Confidence StructuresAgnideep Aich, Ashit Baran Aich, Md Monzur Murshed et al.
Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier's logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model's top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.
MLJul 7, 2025
Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk ForecastingAgnideep Aich, Ashit Baran Aich, Dipak C. Jain
We propose Temporal Conformal Prediction (TCP), a distribution-free framework for constructing well-calibrated prediction intervals in nonstationary time series. TCP couples a modern quantile forecaster with a split-conformal calibration layer on a rolling window and, in its TCP-RM variant, augments the conformal threshold with a single online Robbins-Monro (RM) offset to steer coverage toward a target level in real time. We benchmark TCP against GARCH, Historical Simulation, and a rolling Quantile Regression (QR) baseline across equities (S&P 500), cryptocurrency (Bitcoin), and commodities (Gold). Three results are consistent across assets. First, rolling QR yields the sharpest intervals but is materially under-calibrated (e.g., S&P 500: 83.2% vs. 95% target). Second, TCP (and TCP-RM) achieves near-nominal coverage across assets, with intervals that are wider than Historical Simulation in this evaluation (e.g., S&P 500: 5.21 vs. 5.06). Third, the RM update changes calibration and width only marginally at our default hyperparameters. Crisis-window visualizations around March 2020 show TCP/TCP-RM expanding and then contracting their interval bands promptly as volatility spikes and recedes, with red dots marking days where realized returns fall outside the reported 95% interval (miscoverage). A sensitivity study confirms robustness to window size and step-size choices. Overall, TCP provides a practical, theoretically grounded solution to calibrated uncertainty quantification under distribution shift, bridging statistical inference and machine learning for risk forecasting.