LGJan 9, 2023Code
Balance is Essence: Accelerating Sparse Training via Adaptive Gradient CorrectionBowen Lei, Dongkuan Xu, Ruqi Zhang et al.
Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with the most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to \textbf{5.0\%} in accuracy given the same number of training epochs, and reduces the number of training epochs by up to \textbf{52.1\%} to achieve the same accuracy. Our code is available on: \url{https://github.com/StevenBoys/AGENT}.
STFeb 16
Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional PosteriorsSomjit Roy, Prateek Jaiswal, Anirban Bhattacharya et al.
We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.
MLSep 12, 2023
Generalized Regret Analysis of Thompson Sampling using Fractional PosteriorsPrateek Jaiswal, Debdeep Pati, Anirban Bhattacharya et al.
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} Δ_k\left(\frac{\log(T)}{C(α)Δ_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $Δ_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(α)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $α$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors.
SYMay 13, 2023
An Active Learning-based Approach for Hosting Capacity Analysis in Distribution SystemsKiyeob Lee, Peng Zhao, Anirban Bhattacharya et al.
With the increasing amount of distributed energy resources (DERs) integration, there is a significant need to model and analyze hosting capacity (HC) for future electric distribution grids. Hosting capacity analysis (HCA) examines the amount of DERs that can be safely integrated into the grid and is a challenging task in full generality because there are many possible integration of DERs in foresight. That is, there are numerous extreme points between feasible and infeasible sets. Moreover, HC depends on multiple factors such as (a) adoption patterns of DERs that depend on socio-economic behaviors and (b) how DERs are controlled and managed. These two factors are intrinsic to the problem space because not all integration of DERs may be centrally planned, and could largely change our understanding about HC. This paper addresses the research gap by capturing the two factors (a) and (b) in HCA and by identifying a few most insightful HC scenarios at the cost of domain knowledge. We propose a data-driven HCA framework and introduce active learning in HCA to effectively explore scenarios. Active learning in HCA and characteristics of HC with respect to the two factors (a) and (b) are illustrated in a 3-bus example. Next, detailed large-scale studies are proposed to understand the significance of (a) and (b). Our findings suggest that HC and its interpretations significantly change subject to the two factors (a) and (b).
STJul 4, 2020
Tail-adaptive Bayesian shrinkageSe Yoon Lee, Peng Zhao, Debdeep Pati et al.
Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the so-called ultra-sparsity domain. However, they may not perform desirably when the degree of sparsity is moderate. In this paper, we propose a robust sparse estimation method under diverse sparsity regimes, which has a tail-adaptive shrinkage property. In this property, the tail-heaviness of the prior adjusts adaptively, becoming larger or smaller as the sparsity level increases or decreases, respectively, to accommodate more or fewer signals, a posteriori. We propose a global-local-tail (GLT) Gaussian mixture distribution that ensures this property. We examine the role of the tail-index of the prior in relation to the underlying sparsity level and demonstrate that the GLT posterior contracts at the minimax optimal rate for sparse normal mean models. We apply both the GLT prior and the Horseshoe prior to a real data problem and simulation examples. Our findings indicate that the varying tail rule based on the GLT prior offers advantages over a fixed tail rule based on the Horseshoe prior in diverse sparsity regimes.
MEMar 17, 2020
Directionally Dependent Multi-View Clustering Using Copula ModelKahkashan Afrin, Ashif S. Iquebal, Mostafa Karimi et al.
In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).