LGJun 1, 2022
The robust way to stack and bag: the local Lipschitz wayThulasi Tholeti, Sheetal Kalyani
Recent research has established that the local Lipschitz constant of a neural network directly influences its adversarial robustness. We exploit this relationship to construct an ensemble of neural networks which not only improves the accuracy, but also provides increased adversarial robustness. The local Lipschitz constants for two different ensemble methods - bagging and stacking - are derived and the architectures best suited for ensuring adversarial robustness are deduced. The proposed ensemble architectures are tested on MNIST and CIFAR-10 datasets in the presence of white-box attacks, FGSM and PGD. The proposed architecture is found to be more robust than a) a single network and b) traditional ensemble methods.
LGJul 10, 2024
Randomness Helps Rigor: A Probabilistic Learning Rate Scheduler Bridging Theory and Deep Learning PracticeDahlia Devapriya, Thulasi Tholeti, Janani Suresh et al.
Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been proven theoretically. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. In this work, we aim to bridge the gap by proposing a probabilistic learning rate scheduler (PLRS) that does not conform to the monotonically decreasing condition, with provable convergence guarantees. To cement the relevance and utility of our work in modern day applications, we show experimental results on deep neural network architectures such as ResNet, WRN, VGG, and DenseNet on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. We show that PLRS performs as well as or better than existing state-of-the-art learning rate schedulers in terms of convergence as well as accuracy. For example, while training ResNet-110 on the CIFAR-100 dataset, we outperform the state-of-the-art knee scheduler by $1.56\%$ in terms of classification accuracy. Furthermore, on the Tiny ImageNet dataset using ResNet-50 architecture, we show a significantly more stable convergence than the cosine scheduler and a better classification accuracy than the existing schedulers.
CRJun 16, 2022
Introducing the Huber mechanism for differentially private low-rank matrix completionR Adithya Gowtham, Gokularam M, Thulasi Tholeti et al.
Performing low-rank matrix completion with sensitive user data calls for privacy-preserving approaches. In this work, we propose a novel noise addition mechanism for preserving differential privacy where the noise distribution is inspired by Huber loss, a well-known loss function in robust statistics. The proposed Huber mechanism is evaluated against existing differential privacy mechanisms while solving the matrix completion problem using the Alternating Least Squares approach. We also propose using the Iteratively Re-Weighted Least Squares algorithm to complete low-rank matrices and study the performance of different noise mechanisms in both synthetic and real datasets. We prove that the proposed mechanism achieves ε-differential privacy similar to the Laplace mechanism. Furthermore, empirical results indicate that the Huber mechanism outperforms Laplacian and Gaussian in some cases and is comparable, otherwise.
CLMar 16, 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language ModelsShiran Dudy, Thulasi Tholeti, Resmi Ramachandranpillai et al.
Recent advancements in Large Language Models (LLMs) have made them a popular information-seeking tool among end users. However, the statistical training methods for LLMs have raised concerns about their representation of under-represented topics, potentially leading to biases that could influence real-world decisions and opportunities. These biases could have significant economic, social, and cultural impacts as LLMs become more prevalent, whether through direct interactions--such as when users engage with chatbots or automated assistants--or through their integration into third-party applications (as agents), where the models influence decision-making processes and functionalities behind the scenes. Our study examines the biases present in LLMs recommendations of U.S. cities and towns across three domains: relocation, tourism, and starting a business. We explore two key research questions: (i) How similar LLMs responses are, and (ii) How this similarity might favor areas with certain characteristics over others, introducing biases. We focus on the consistency of LLMs responses and their tendency to over-represent or under-represent specific locations. Our findings point to consistent demographic biases in these recommendations, which could perpetuate a ``rich-get-richer'' effect that widens existing economic disparities.
SPOct 14, 2024
Online waveform selection for cognitive radarThulasi Tholeti, Avinash Rangarajan, Sheetal Kalyani
Designing a cognitive radar system capable of adapting its parameters is challenging, particularly when tasked with tracking a ballistic missile throughout its entire flight. In this work, we focus on proposing adaptive algorithms that select waveform parameters in an online fashion. Our novelty lies in formulating the learning problem using domain knowledge derived from the characteristics of ballistic trajectories. We propose three reinforcement learning algorithms: bandwidth scaling, Q-learning, and Q-learning lookahead. These algorithms dynamically choose the bandwidth for each transmission based on received feedback. Through experiments on synthetically generated ballistic trajectories, we demonstrate that our proposed algorithms achieve the dual objectives of minimizing range error and maintaining continuous tracking without losing the target.
LGOct 28, 2021
How to boost autoencoders?Sai Krishna, Thulasi Tholeti, Sheetal Kalyani
Autoencoders are a category of neural networks with applications in numerous domains and hence, improvement of their performance is gaining substantial interest from the machine learning community. Ensemble methods, such as boosting, are often adopted to enhance the performance of regular neural networks. In this work, we discuss the challenges associated with boosting autoencoders and propose a framework to overcome them. The proposed method ensures that the advantages of boosting are realized when either output (encoded or reconstructed) is used. The usefulness of the boosted ensemble is demonstrated in two applications that widely employ autoencoders: anomaly detection and clustering.
ITOct 27, 2021
Binarized ResNet: Enabling Robust Automatic Modulation Classification at the resource-constrained EdgeDeepsayan Sadhukhan, Nitin Priyadarshini Shankar, Nancy Nayak et al.
Recently, deep neural networks (DNNs) have been used extensively for automatic modulation classification (AMC), and the results have been quite promising. However, DNNs have high memory and computation requirements making them impractical for edge networks where the devices are resource-constrained. They are also vulnerable to adversarial attacks, which is a significant security concern. This work proposes a rotated binary large ResNet (RBLResNet) for AMC that can be deployed at the edge network because of low memory and computational complexity. The performance gap between the RBLResNet and existing architectures with floating-point weights and activations can be closed by two proposed ensemble methods: (i) multilevel classification (MC), and (ii) bagging multiple RBLResNets while retaining low memory and computational power. The MC method achieves an accuracy of $93.39\%$ at $10$dB over all the $24$ modulation classes of the Deepsig dataset. This performance is comparable to state-of-the-art (SOTA) performances, with $4.75$ times lower memory and $1214$ times lower computation. Furthermore, RBLResNet also has high adversarial robustness compared to existing DNN models. The proposed MC method with RBLResNets has an adversarial accuracy of $87.25\%$ over a wide range of SNRs, surpassing the robustness of all existing SOTA methods to the best of our knowledge. Properties such as low memory, low computation, and the highest adversarial robustness make it a better choice for robust AMC in low-power edge devices.
LGJan 18, 2021
On the Differentially Private Nature of Perturbed Gradient DescentThulasi Tholeti, Sheetal Kalyani
We consider the problem of empirical risk minimization given a database, using the gradient descent algorithm. We note that the function to be optimized may be non-convex, consisting of saddle points which impede the convergence of the algorithm. A perturbed gradient descent algorithm is typically employed to escape these saddle points. We show that this algorithm, that perturbs the gradient, inherently preserves the privacy of the data. We then employ the differential privacy framework to quantify the privacy hence achieved. We also analyze the change in privacy with varying parameters such as problem dimension and the distance between the databases.
LGMar 22, 2020
Tune smarter not harder: A principled approach to tuning learning rates for shallow netsThulasi Tholeti, Sheetal Kalyani
Effective hyper-parameter tuning is essential to guarantee the performance that neural networks have come to be known for. In this work, a principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. We associate the learning rate with the gradient Lipschitz constant of the objective to be minimized while training. An upper bound on the mentioned constant is derived and a search algorithm, which always results in non-divergent traces, is proposed to exploit the derived bound. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed method is applied to three different existing applications: a) channel estimation in OFDM systems, b) prediction of the exchange currency rates and c) offset estimation in OFDM receivers, and it is shown to pick better learning rates than the existing methods using the same or lesser compute power.
ITMar 20, 2020
Green DetNet: Computation and Memory efficient DetNet using Smart Compression and TrainingNancy Nayak, Thulasi Tholeti, Muralikrishnan Srinivasan et al.
This paper introduces an incremental training framework for compressing popular Deep Neural Network (DNN) based unfolded multiple-input-multiple-output (MIMO) detection algorithms like DetNet. The idea of incremental training is explored to select the optimal depth while training. To reduce the computation requirements or the number of FLoating point OPerations (FLOPs) and enforce sparsity in weights, the concept of structured regularization is explored using group LASSO and sparse group LASSO. Our methods lead to an astounding $98.9\%$ reduction in memory requirement and $81.63\%$ reduction in FLOPs when compared with DetNet without compromising on BER performance.
OCMay 28, 2019
Concavifiability and convergence: necessary and sufficient conditions for gradient descent analysisThulasi Tholeti, Sheetal Kalyani
Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lipschitz continuous remained an integral part of the analysis until recently. In this work, we look at convergence analysis by focusing on a property that we term as concavifiability, instead of Lipschitz continuity of gradients. We show that concavifiability is a necessary and sufficient condition to satisfy the upper quadratic approximation which is key in proving that the objective function decreases after every gradient descent update. We also show that any gradient Lipschitz function satisfies concavifiability. A constant known as the concavifier analogous to the gradient Lipschitz constant is derived which is indicative of the optimal step size. As an application, we demonstrate the utility of finding the concavifier the in convergence of gradient descent through an example inspired by neural networks. We derive bounds on the concavifier to obtain a fixed step size for a single hidden layer ReLU network.
ITApr 30, 2018
A Centralized Multi-stage Non-parametric Learning Algorithm for Opportunistic Spectrum AccessThulasi Tholeti, Vishnu Raj, Sheetal Kalyani
Owing to the ever-increasing demand in wireless spectrum, Cognitive Radio (CR) was introduced as a technique to attain high spectral efficiency. As the number of secondary users (SUs) connecting to the cognitive radio network is on the rise, there is an imminent need for centralized algorithms that provide high throughput and energy efficiency of the SUs while ensuring minimum interference to the licensed users. In this work, we propose a multi-stage algorithm that - 1) effectively assigns the available channel to the SUs, 2) employs a non-parametric learning framework to estimate the primary traffic distribution to minimize sensing, and 3) proposes an adaptive framework to ensure that the collision to the primary user is below the specified threshold. We provide comprehensive empirical validation of the method with other approaches.
ITJul 31, 2017
Spectrum Access In Cognitive Radio Using A Two Stage Reinforcement Learning ApproachVishnu Raj, Irene Dias, Thulasi Tholeti et al.
With the advent of the 5th generation of wireless standards and an increasing demand for higher throughput, methods to improve the spectral efficiency of wireless systems have become very important. In the context of cognitive radio, a substantial increase in throughput is possible if the secondary user can make smart decisions regarding which channel to sense and when or how often to sense. Here, we propose an algorithm to not only select a channel for data transmission but also to predict how long the channel will remain unoccupied so that the time spent on channel sensing can be minimized. Our algorithm learns in two stages - a reinforcement learning approach for channel selection and a Bayesian approach to determine the optimal duration for which sensing can be skipped. Comparisons with other learning methods are provided through extensive simulations. We show that the number of sensing is minimized with negligible increase in primary interference; this implies that lesser energy is spent by the secondary user in sensing and also higher throughput is achieved by saving on sensing.