Sheetal Kalyani

LG
h-index22
48papers
430citations
Novelty52%
AI Score56

48 Papers

LGAug 21, 2024Code
First line of defense: A robust first layer mitigates adversarial attacks

Janani Suresh, Nancy Nayak, Sheetal Kalyani

Adversarial training (AT) incurs significant computational overhead, leading to growing interest in designing inherently robust architectures. We demonstrate that a carefully designed first layer of the neural network can serve as an implicit adversarial noise filter (ANF). This filter is created using a combination of large kernel size, increased convolution filters, and a maxpool operation. We show that integrating this filter as the first layer in architectures such as ResNet, VGG, and EfficientNet results in adversarially robust networks. Our approach achieves higher adversarial accuracies than existing natively robust architectures without AT and is competitive with adversarial-trained architectures across a wide range of datasets. Supporting our findings, we show that (a) the decision regions for our method have better margins, (b) the visualized loss surfaces are smoother, (c) the modified peak signal-to-noise ratio (mPSNR) values at the output of the ANF are higher, (d) high-frequency components are more attenuated, and (e) architectures incorporating ANF exhibit better denoising in Gaussian noise compared to baseline architectures. Code for all our experiments are available at \url{https://github.com/janani-suresh-97/first-line-defence.git}.

8.0ITMay 2
Neural Equalisers for Highly Compressed Faster-than-Nyquist Signalling: Design, Performance, Complexity and Robustness

Shubham Paul, Sheetal Kalyani, Nambi Sheshadri et al.

Faster-than-Nyquist (FTN) signalling has emerged as a compelling technique for enhancing spectral efficiency in bandwidth-constrained communication systems. By intentionally introducing controlled intersymbol interference (ISI), FTN allows transmission at rates exceeding the traditional Nyquist limit, unlocking new potential in high-speed data communication. However, its practical deployment remains challenged by the need for low-complexity detection strategies that can cope with the induced ISI while maintaining low latency and robust performance. We propose deep learning receivers that are resilient to non-idealities. In this paper, we present a deep learning-based framework for FTN signalling that addresses these challenges through several novel contributions. First, we propose a sliding window detection method that leverages temporal context while preserving computational efficiency. Second, we demonstrate the viability of FTN systems with very low packing factors, showing that reliable performance can be achieved even under aggressive spectral compression (up to 75\%). Our architecture is optimised for low latency and low complexity, making it suitable for real-time applications and scalable deployment. In addition, we assess the robustness of our models across varying channel conditions and noise profiles, providing insights into their generalisability and resilience.

4.7CLMay 27
Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

Prasenjit K Mudi, Dahlia Devapriya, Sheetal Kalyani

Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.

QUANT-PHDec 17, 2022
Unrolling SVT to obtain computationally efficient SVT for n-qubit quantum state tomography

Siva Shanmugam, Sheetal Kalyani

Quantum state tomography aims to estimate the state of a quantum mechanical system which is described by a trace one, Hermitian positive semidefinite complex matrix, given a set of measurements of the state. Existing works focus on estimating the density matrix that represents the state, using a compressive sensing approach, with only fewer measurements than that required for a tomographically complete set, with the assumption that the true state has a low rank. One very popular method to estimate the state is the use of the Singular Value Thresholding (SVT) algorithm. In this work, we present a machine learning approach to estimate the quantum state of n-qubit systems by unrolling the iterations of SVT which we call Learned Quantum State Tomography (LQST). As merely unrolling SVT may not ensure that the output of the network meets the constraints required for a quantum state, we design and train a custom neural network whose architecture is inspired from the iterations of SVT with additional layers to meet the required constraints. We show that our proposed LQST with very few layers reconstructs the density matrix with much better fidelity than the SVT algorithm which takes many hundreds of iterations to converge. We also demonstrate the reconstruction of the quantum Bell state from an informationally incomplete set of noisy measurements.

ITJul 9, 2024
DRL-AdaPart: DRL-Driven Adaptive STAR-RIS Partitioning for Fair and Frugal Resource Utilization

Ashok S. Kumar, Nancy Nayak, Sheetal Kalyani et al.

In this work, we propose a method for efficient resource utilization of simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) elements to ensure fair and high data rates. We introduce a subsurface assignment variable that determines the number of STAR-RIS elements allocated to each user and maximizes the sum of the data rates by jointly optimizing the phase shifts of the STAR-RIS and the subsurface assignment variables using an appropriately tailored deep reinforcement learning (DRL) algorithm. The proposed DRL method is also compared with a Dinkelbach algorithm and the designed hybrid DRL approach. A penalty term is incorporated into the DRL model to enhance resource utilization by intelligently deactivating STAR-RIS elements when not required. The proposed DRL method can achieve fair and high data rates for static and mobile users while ensuring efficient resource utilization through extensive simulations. Using the proposed DRL method, up to 27% and 21% of STAR-RIS elements can be deactivated in static and mobile scenarios, respectively, without affecting performance.

LGJun 1, 2022
The robust way to stack and bag: the local Lipschitz way

Thulasi Tholeti, Sheetal Kalyani

Recent research has established that the local Lipschitz constant of a neural network directly influences its adversarial robustness. We exploit this relationship to construct an ensemble of neural networks which not only improves the accuracy, but also provides increased adversarial robustness. The local Lipschitz constants for two different ensemble methods - bagging and stacking - are derived and the architectures best suited for ensuring adversarial robustness are deduced. The proposed ensemble architectures are tested on MNIST and CIFAR-10 datasets in the presence of white-box attacks, FGSM and PGD. The proposed architecture is found to be more robust than a) a single network and b) traditional ensemble methods.

4.1QUANT-PHMar 26
Maximizing Qubit Throughput under Buffer Decoherence and Variability in Generation

Padma Priyanka, Avhishek Chatterjee, Sheetal Kalyani

Quantum communication networks require transmission of high-fidelity, uncoded qubits for applications such as entanglement distribution and quantum key distribution. However, current implementations are constrained by limited buffer capacity and qubit decoherence, which degrades qubit quality while waiting in the buffer. A key challenge arises from the stochastic nature of qubit generation, there exists a random delay (D) between the initiation of a generation request and the availability of the qubit. This induces a fundamental trade off early initiation increases buffer waiting time and hence decoherence, whereas delayed initiation leads to server idling and reduced throughput. We model this system as an admission control problem in a finite buffer queue, where the reward associated with each job is a decreasing function of its sojourn time. We derive analytical conditions under which a simple "no lag" policy where a new qubit is generated immediately upon the availability of buffer space is optimal. To address scenarios with unknown system parameters, we further develop a Bayesian learning framework that adaptively optimizes the admission policy. In addition to quantum communication systems, the proposed model is applicable to delay sensitive IoT sensing and service systems.

31.1LGMar 16
Federated Learning of Binary Neural Networks: Enabling Low-Cost Inference

Nitin Priyadarshini Shankar, Soham Lahiri, Sheetal Kalyani et al.

Federated Learning (FL) preserves privacy by distributing training across devices. However, using DNNs is computationally intensive at the low-powered edge during inference. Edge deployment demands models that simultaneously optimize memory footprint and computational efficiency, a dilemma where conventional DNNs fail by exceeding resource limits. Traditional post-training binarization reduces model size but suffers from severe accuracy loss due to quantization errors. To address these challenges, we propose FedBNN, a rotation-aware binary neural network framework that learns binary representations directly during local training. By encoding each weight as a single bit $\{+1, -1\}$ instead of a $32$-bit float, FedBNN shrinks the model footprint, significantly reducing runtime (during inference) FLOPs and memory requirements in comparison to federated methods using real models. Evaluations across multiple benchmark datasets demonstrate that FedBNN significantly reduces resource consumption while performing similarly to existing federated methods using real-valued models.

LGDec 3, 2025
Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization

Lakshmi Jayalal, Sheetal Kalyani

Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods like multiple measurement vectors orthogonal matching pursuit (M-OMP) and multiple measurement vectors FOCal Underdetermined System Solver (M-FOCUSS) often require careful parameter tuning or prior knowledge of the sparsity of the signal and/or noise variance. We introduce a novel tuning-free framework that leverages Implicit Regularization (IR) from overparameterization to overcome this limitation. Our approach reparameterizes the estimation matrix into factors that decouple the shared row-support from individual vector entries. We show that the optimization dynamics inherently promote the desired row-sparse structure by applying gradient descent to a standard least-squares objective on these factors. We prove that with a sufficiently small and balanced initialization, the optimization dynamics exhibit a "momentum-like" effect, causing the norms of rows in the true support to grow significantly faster than others. This formally guarantees that the solution trajectory converges towards an idealized row-sparse solution. Additionally, empirical results demonstrate that our approach achieves performance comparable to established methods without requiring any prior information or tuning.

CLSep 24, 2024
Textless NLP -- Zero Resource Challenge with Low Resource Compute

Krithiga Ramadass, Abrit Pal Singh, Srihari J et al.

This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for better audio quality. Additionally, we explore the latent space representation for Indian languages such as Tamil and Bengali for the acoustic unit discovery and voice conversion task. Our approach leverages a quantized encoder architecture, in conjunction with a vocoder which utilizes the proposed mixture of optimized hop length, tuned interpolation scale factors and a cyclic learning rate scheduler. We obtain consistently good results across English, Tamil and Bengali datasets. The proposed method excels in capturing complex linguistic patterns, resulting in clear reconstructed audio during voice conversion with significantly reduced training time.

CRSep 17, 2024
Golden Ratio Search: A Low-Power Adversarial Attack for Deep Learning based Modulation Classification

Deepsayan Sadhukhan, Nitin Priyadarshini Shankar, Sheetal Kalyani

We propose a minimal power white box adversarial attack for Deep Learning based Automatic Modulation Classification (AMC). The proposed attack uses the Golden Ratio Search (GRS) method to find powerful attacks with minimal power. We evaluate the efficacy of the proposed method by comparing it with existing adversarial attack approaches. Additionally, we test the robustness of the proposed attack against various state-of-the-art architectures, including defense mechanisms such as adversarial training, binarization, and ensemble methods. Experimental results demonstrate that the proposed attack is powerful, requires minimal power, and can be generated in less time, significantly challenging the resilience of current AMC methods.

LGSep 17, 2024
Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS

Dahlia Devapriya, Aparna V C, Sheetal Kalyani

In reconfigurable intelligent surface (RIS) aided systems, the joint optimization of the precoder matrix at the base station and the phase shifts of the RIS elements involves significant complexity. In this paper, we propose a complex-valued, geometry aware meta-learning neural network that maximizes the weighted sum rate in a multi-user multiple input single output system. By leveraging the complex circle geometry for phase shifts and spherical geometry for the precoder, the optimization occurs on Riemannian manifolds, leading to faster convergence. We use a complex-valued neural network for phase shifts and an Euler inspired update for the precoder network. Our approach outperforms existing neural network-based algorithms, offering higher weighted sum rates, lower power consumption, and significantly faster convergence. Specifically, it converges faster by nearly 100 epochs, with a 0.7 bps improvement in weighted sum rate and a 1.8 dB power gain when compared with existing work. Further it outperforms the state-of-the-art alternating optimization algorithm by 0.86 bps with a 2.6 dB power gain.

LGSep 15, 2024
Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Padma Priyanka, Sheetal Kalyani, Avhishek Chatterjee

Learning rate is a crucial parameter in training of neural networks. A properly tuned learning rate leads to faster training and higher test accuracy. In this paper, we propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks. The proposed approach is compared with the popular HyperOpt technique used extensively for hyperparameter optimization and the recently developed bandit-based algorithm BLiE. The results for multiple neural network architectures indicate that our method finds a better learning rate using a) fewer evaluations and b) lesser number of epochs per evaluation, when compared to both HyperOpt and BLiE. Thus, the proposed approach enables more efficient training of neural networks, leading to lower training time and lesser computational cost.

LGSep 11, 2024
Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization

Lakshmi Jayalal, Gokularam Muthukrishnan, Sheetal Kalyani

The performance of the standard Online Robust Principal Component Analysis (OR-PCA) technique depends on the optimum tuning of the explicit regularizers and this tuning is dataset sensitive. We aim to remove the dependency on these tuning parameters by using implicit regularization. We propose to use the implicit regularization effect of various modified gradient descents to make OR-PCA tuning free. Our method incorporates three different versions of modified gradient descent that separately but naturally encourage sparsity and low-rank structures in the data. The proposed method performs comparable or better than the tuned OR-PCA for both simulated and real-world datasets. Tuning-free ORPCA makes it more scalable for large datasets since we do not require dataset-dependent parameter tuning.

LGJul 10, 2024
Randomness Helps Rigor: A Probabilistic Learning Rate Scheduler Bridging Theory and Deep Learning Practice

Dahlia Devapriya, Thulasi Tholeti, Janani Suresh et al.

Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been proven theoretically. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. In this work, we aim to bridge the gap by proposing a probabilistic learning rate scheduler (PLRS) that does not conform to the monotonically decreasing condition, with provable convergence guarantees. To cement the relevance and utility of our work in modern day applications, we show experimental results on deep neural network architectures such as ResNet, WRN, VGG, and DenseNet on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. We show that PLRS performs as well as or better than existing state-of-the-art learning rate schedulers in terms of convergence as well as accuracy. For example, while training ResNet-110 on the CIFAR-100 dataset, we outperform the state-of-the-art knee scheduler by $1.56\%$ in terms of classification accuracy. Furthermore, on the Tiny ImageNet dataset using ResNet-50 architecture, we show a significantly more stable convergence than the cosine scheduler and a better classification accuracy than the existing schedulers.

CRJun 16, 2022
Introducing the Huber mechanism for differentially private low-rank matrix completion

R Adithya Gowtham, Gokularam M, Thulasi Tholeti et al.

Performing low-rank matrix completion with sensitive user data calls for privacy-preserving approaches. In this work, we propose a novel noise addition mechanism for preserving differential privacy where the noise distribution is inspired by Huber loss, a well-known loss function in robust statistics. The proposed Huber mechanism is evaluated against existing differential privacy mechanisms while solving the matrix completion problem using the Alternating Least Squares approach. We also propose using the Iteratively Re-Weighted Least Squares algorithm to complete low-rank matrices and study the performance of different noise mechanisms in both synthetic and real datasets. We prove that the proposed mechanism achieves ε-differential privacy similar to the Laplace mechanism. Furthermore, empirical results indicate that the Huber mechanism outperforms Laplacian and Gaussian in some cases and is comparable, otherwise.

LGJun 1, 2022
Rotate the ReLU to implicitly sparsify deep networks

Nancy Nayak, Sheetal Kalyani

In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and its variants. Hence, by rotating the ReLU, the weights or the filters that are not necessary are automatically identified and can be dropped thus giving rise to significant savings in memory and computation. Furthermore, in some cases, we also notice that along with saving in memory and computation we also obtain improvement over the reported performance of the corresponding baseline work in the popular datasets such as MNIST, CIFAR-10, CIFAR-100, and SVHN.

LGDec 5, 2025
BERTO: an Adaptive BERT-based Network Time Series Predictor with Operator Preferences in Natural Language

Nitin Priyadarshini Shankar, Vaibhav Singh, Sheetal Kalyani et al.

We introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO delivers high prediction accuracy, while its Balancing Loss Function and prompt-based customization allow operators to adjust the trade-off between power savings and performance. Natural language prompts guide the model to manage underprediction and overprediction in accordance with the operator's intent. Experiments on real-world datasets show that BERTO improves upon existing models with a $4.13$\% reduction in MSE while introducing the feature of balancing competing objectives of power saving and performance through simple natural language inputs, operating over a flexible range of $1.4$ kW in power and up to $9\times$ variation in service quality, making it well suited for intelligent RAN deployments.

LGOct 14, 2025
Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models

Prasenjit K Mudi, Anshi Sachan, Dahlia Devapriya et al.

Whisper models have achieved remarkable progress in speech recognition; yet their large size remains a bottleneck for deployment on resource-constrained edge devices. This paper proposes a framework to design fine-tuned variants of Whisper which address the above problem. Structured sparsity is enforced via the Sparse Group LASSO penalty as a loss regularizer, to reduce the number of FLOating Point operations (FLOPs). Further, a weight statistics aware pruning algorithm is proposed. We also design our custom text normalizer for WER evaluation. On Common Voice 11.0 Hindi dataset, we obtain, without degrading WER, (a) 35.4% reduction in model parameters, 14.25% lower memory consumption and 18.5% fewer FLOPs on Whisper-small, and (b) 31% reduction in model parameters, 15.29% lower memory consumption and 16.95% fewer FLOPs on Whisper-medium; and, (c) substantially outperform the state-of-the-art Iterative Magnitude Pruning based method by pruning 18.7% more parameters along with a 12.31 reduction in WER.

SPOct 14, 2024
Online waveform selection for cognitive radar

Thulasi Tholeti, Avinash Rangarajan, Sheetal Kalyani

Designing a cognitive radar system capable of adapting its parameters is challenging, particularly when tasked with tracking a ballistic missile throughout its entire flight. In this work, we focus on proposing adaptive algorithms that select waveform parameters in an online fashion. Our novelty lies in formulating the learning problem using domain knowledge derived from the characteristics of ballistic trajectories. We propose three reinforcement learning algorithms: bandwidth scaling, Q-learning, and Q-learning lookahead. These algorithms dynamically choose the bandwidth for each transmission based on received feedback. Through experiments on synthetically generated ballistic trajectories, we demonstrate that our proposed algorithms achieve the dual objectives of minimizing range error and maintaining continuous tracking without losing the target.

LGOct 28, 2021
How to boost autoencoders?

Sai Krishna, Thulasi Tholeti, Sheetal Kalyani

Autoencoders are a category of neural networks with applications in numerous domains and hence, improvement of their performance is gaining substantial interest from the machine learning community. Ensemble methods, such as boosting, are often adopted to enhance the performance of regular neural networks. In this work, we discuss the challenges associated with boosting autoencoders and propose a framework to overcome them. The proposed method ensures that the advantages of boosting are realized when either output (encoded or reconstructed) is used. The usefulness of the boosted ensemble is demonstrated in two applications that widely employ autoencoders: anomaly detection and clustering.

ITOct 27, 2021
Binarized ResNet: Enabling Robust Automatic Modulation Classification at the resource-constrained Edge

Deepsayan Sadhukhan, Nitin Priyadarshini Shankar, Nancy Nayak et al.

Recently, deep neural networks (DNNs) have been used extensively for automatic modulation classification (AMC), and the results have been quite promising. However, DNNs have high memory and computation requirements making them impractical for edge networks where the devices are resource-constrained. They are also vulnerable to adversarial attacks, which is a significant security concern. This work proposes a rotated binary large ResNet (RBLResNet) for AMC that can be deployed at the edge network because of low memory and computational complexity. The performance gap between the RBLResNet and existing architectures with floating-point weights and activations can be closed by two proposed ensemble methods: (i) multilevel classification (MC), and (ii) bagging multiple RBLResNets while retaining low memory and computational power. The MC method achieves an accuracy of $93.39\%$ at $10$dB over all the $24$ modulation classes of the Deepsig dataset. This performance is comparable to state-of-the-art (SOTA) performances, with $4.75$ times lower memory and $1214$ times lower computation. Furthermore, RBLResNet also has high adversarial robustness compared to existing DNN models. The proposed MC method with RBLResNets has an adversarial accuracy of $87.25\%$ over a wide range of SNRs, surpassing the robustness of all existing SOTA methods to the best of our knowledge. Properties such as low memory, low computation, and the highest adversarial robustness make it a better choice for robust AMC in low-power edge devices.

SPOct 15, 2021
BayesAoA: A Bayesian method for Computation Efficient Angle of Arrival Estimation

Akshay Sharma, Nancy Nayak, Sheetal Kalyani

The angle of Arrival (AoA) estimation is of great interest in modern communication systems. Traditional maximum likelihood-based iterative algorithms are sensitive to initialization and cannot be used online. We propose a Bayesian method to find AoA that is insensitive towards initialization. The proposed method is less complex and needs fewer computing resources than traditional deep learning-based methods. It has a faster convergence than the brute-force methods. Further, a Hedge type solution is proposed that helps to deploy the method online to handle the situations where the channel noise and antenna configuration in the receiver change over time. The proposed method achieves $92\%$ accuracy in a channel of noise variance $10^{-6}$ with $19.3\%$ of the brute-force method's computation.

LGMay 14, 2021
Deep learned SVT: Unrolling singular value thresholding to obtain better MSE

Siva Shanmugam, Sheetal Kalyani

Affine rank minimization problem is the generalized version of low rank matrix completion problem where linear combinations of the entries of a low rank matrix are observed and the matrix is estimated from these measurements. We propose a trainable deep neural network by unrolling a popular iterative algorithm called the singular value thresholding (SVT) algorithm to perform this generalized matrix completion which we call Learned SVT (LSVT). We show that our proposed LSVT with fixed layers (say T) reconstructs the matrix with lesser mean squared error (MSE) compared with that incurred by SVT with fixed (same T) number of iterations and our method is much more robust to the parameters which need to be carefully chosen in SVT algorithm.

LGJan 18, 2021
On the Differentially Private Nature of Perturbed Gradient Descent

Thulasi Tholeti, Sheetal Kalyani

We consider the problem of empirical risk minimization given a database, using the gradient descent algorithm. We note that the function to be optimized may be non-convex, consisting of saddle points which impede the convergence of the algorithm. A perturbed gradient descent algorithm is typically employed to escape these saddle points. We show that this algorithm, that perturbs the gradient, inherently preserves the privacy of the data. We then employ the differential privacy framework to quantify the privacy hence achieved. We also analyze the change in privacy with varying parameters such as problem dimension and the distance between the databases.

LGJun 13, 2020
Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck

Vishnu Raj, Nancy Nayak, Sheetal Kalyani

Compact neural networks are essential for affordable and power efficient deep learning solutions. Binary Neural Networks (BNNs) take compactification to the extreme by constraining both weights and activations to two levels, $\{+1, -1\}$. However, training BNNs are not easy due to the discontinuity in activation functions, and the training dynamics of BNNs is not well understood. In this paper, we present an information-theoretic perspective of BNN training. We analyze BNNs through the Information Bottleneck principle and observe that the training dynamics of BNNs is considerably different from that of Deep Neural Networks (DNNs). While DNNs have a separate empirical risk minimization and representation compression phases, our numerical experiments show that in BNNs, both these phases are simultaneous. Since BNNs have a less expressive capacity, they tend to find efficient hidden representations concurrently with label fitting. Experiments in multiple datasets support these observations, and we see a consistent behavior across different activation functions in BNNs.

LGMar 22, 2020
Tune smarter not harder: A principled approach to tuning learning rates for shallow nets

Thulasi Tholeti, Sheetal Kalyani

Effective hyper-parameter tuning is essential to guarantee the performance that neural networks have come to be known for. In this work, a principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. We associate the learning rate with the gradient Lipschitz constant of the objective to be minimized while training. An upper bound on the mentioned constant is derived and a search algorithm, which always results in non-divergent traces, is proposed to exploit the derived bound. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed method is applied to three different existing applications: a) channel estimation in OFDM systems, b) prediction of the exchange currency rates and c) offset estimation in OFDM receivers, and it is shown to pick better learning rates than the existing methods using the same or lesser compute power.

ITMar 20, 2020
Green DetNet: Computation and Memory efficient DetNet using Smart Compression and Training

Nancy Nayak, Thulasi Tholeti, Muralikrishnan Srinivasan et al.

This paper introduces an incremental training framework for compressing popular Deep Neural Network (DNN) based unfolded multiple-input-multiple-output (MIMO) detection algorithms like DetNet. The idea of incremental training is explored to select the optimal depth while training. To reduce the computation requirements or the number of FLoating point OPerations (FLOPs) and enforce sparsity in weights, the concept of structured regularization is explored using group LASSO and sparse group LASSO. Our methods lead to an astounding $98.9\%$ reduction in memory requirement and $81.63\%$ reduction in FLOPs when compared with DetNet without compromising on BER performance.

SPJan 25, 2020
Deep Reinforcement Learning based Blind mmWave MIMO Beam Alignment

Vishnu Raj, Nancy Nayak, Sheetal Kalyani

Directional beamforming is a crucial component for realizing robust wireless communication systems using millimeter wave (mmWave) technology. Beam alignment using brute-force search of the space introduces time overhead while location aided blind beam alignment adds additional hardware requirements to the system. In this paper, we introduce a method for blind beam alignment based on the RF fingerprints of user equipment obtained by the base stations. The proposed system performs blind beam alignment on a multiple base station cellular environment with multiple mobile users using deep reinforcement learning. We present a novel neural network architecture that can handle a mix of both continuous and discrete actions and use policy gradient methods to train the model. Our results show that the proposed method can achieve a data rate of up to four times the traditional method without any overheads.

MLDec 18, 2019
Generalized Residual Ratio Thresholding

Sreejith Kallummil, Sheetal Kalyani

Simultaneous orthogonal matching pursuit (SOMP) and block OMP (BOMP) are two widely used techniques for sparse support recovery in multiple measurement vector (MMV) and block sparse (BS) models respectively. For optimal performance, both SOMP and BOMP require \textit{a priori} knowledge of signal sparsity or noise variance. However, sparsity and noise variance are unavailable in most practical applications. This letter presents a novel technique called generalized residual ratio thresholding (GRRT) for operating SOMP and BOMP without the \textit{a priori} knowledge of signal sparsity and noise variance and derive finite sample and finite signal to noise ratio (SNR) guarantees for exact support recovery. Numerical simulations indicate that GRRT performs similar to BOMP and SOMP with \textit{a priori} knowledge of signal and noise statistics.

ITOct 13, 2019
Beyond 5G: Leveraging Cell Free TDD Massive MIMO using Cascaded Deep learning

Navaneet Athreya, Vishnu Raj, Sheetal Kalyani

This paper deals with the calibration of Time Division Duplexing (TDD) reciprocity in an Orthogonal Frequency Division Multiplexing (OFDM) based Cell Free Massive MIMO system where the responses of the (Radio Frequency) RF chains render the end to end channel non-reciprocal, even though the physical wireless channel is reciprocal. We further address the non-availability of the uplink channel estimates at locations other than pilot subcarriers and propose a single-shot solution to estimate the downlink channel at all subcarriers from the uplink channel at selected pilot subcarriers. We propose a cascade of two Deep Neural Networks (DNN) to achieve the objective. The proposed method is easily scalable and removes the need for relative reciprocity calibration based on the cooperation of antennas, which usually introduces dependency in Cell Free Massive MIMO systems.

MLSep 10, 2019
Subspace clustering without knowing the number of clusters: A parameter free approach

Vishnu Menon, Gokularam M, Sheetal Kalyani

Subspace clustering, the task of clustering high dimensional data when the data points come from a union of subspaces is one of the fundamental tasks in unsupervised machine learning. Most of the existing algorithms for this task require prior knowledge of the number of clusters along with few additional parameters which need to be set or tuned apriori according to the type of data to be clustered. In this work, a parameter free method for subspace clustering is proposed, where the data points are clustered on the basis of the difference in statistical distribution of the angles subtended by the data points within a subspace and those by points belonging to different subspaces. Given an initial fine clustering, the proposed algorithm merges the clusters until a final clustering is obtained. This, unlike many existing methods, does not require the number of clusters apriori. Also, the proposed algorithm does not involve the use of an unknown parameter or tuning for one. %through cross validation. A parameter free method for producing a fine initial clustering is also discussed, making the whole process of subspace clustering parameter free. The comparison of proposed algorithm's performance with that of the existing state-of-the-art techniques in synthetic and real data sets, shows the significance of the proposed method.

OCMay 28, 2019
Concavifiability and convergence: necessary and sufficient conditions for gradient descent analysis

Thulasi Tholeti, Sheetal Kalyani

Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lipschitz continuous remained an integral part of the analysis until recently. In this work, we look at convergence analysis by focusing on a property that we term as concavifiability, instead of Lipschitz continuity of gradients. We show that concavifiability is a necessary and sufficient condition to satisfy the upper quadratic approximation which is key in proving that the objective function decreases after every gradient descent update. We also show that any gradient Lipschitz function satisfies concavifiability. A constant known as the concavifier analogous to the gradient Lipschitz constant is derived which is indicative of the optimal step size. As an application, we demonstrate the utility of finding the concavifier the in convergence of gradient descent through an example inspired by neural networks. We derive bounds on the concavifier to obtain a fixed step size for a single hidden layer ReLU network.

ITApr 18, 2019
Design of Communication Systems using Deep Learning: A Variational Inference Perspective

Vishnu Raj, Sheetal Kalyani

Recent research in the design of end to end communication system using deep learning has produced models which can outperform traditional communication schemes. Most of these architectures leveraged autoencoders to design the encoder at the transmitter and decoder at the receiver and train them jointly by modeling transmit symbols as latent codes from the encoder. However, in communication systems, the receiver has to work with noise corrupted versions of transmit symbols. Traditional autoencoders are not designed to work with latent codes corrupted with noise. In this work, we provide a framework to design end to end communication systems which accounts for the existence of noise corrupted transmit symbols. The proposed method uses deep neural architecture. An objective function for optimizing these models is derived based on the concepts of variational inference. Further, domain knowledge such as channel type can be systematically integrated into the objective. Through numerical simulation, the proposed method is shown to consistently produce models with better packing density and achieving it faster in multiple popular channel models as compared to the previous works leveraging deep learning models.

SPNov 17, 2018
High SNR Consistent Compressive Sensing Without Signal and Noise Statistics

Sreejith Kallummil, Sheetal Kalyani

Recovering the support of sparse vectors in underdetermined linear regression models, \textit{aka}, compressive sensing is important in many signal processing applications. High SNR consistency (HSC), i.e., the ability of a support recovery technique to correctly identify the support with increasing signal to noise ratio (SNR) is an increasingly popular criterion to qualify the high SNR optimality of support recovery techniques. The HSC results available in literature for support recovery techniques applicable to underdetermined linear regression models like least absolute shrinkage and selection operator (LASSO), orthogonal matching pursuit (OMP) etc. assume \textit{a priori} knowledge of noise variance or signal sparsity. However, both these parameters are unavailable in most practical applications. Further, it is extremely difficult to estimate noise variance or signal sparsity in underdetermined regression models. This limits the utility of existing HSC results. In this article, we propose two techniques, \textit{viz.}, residual ratio minimization (RRM) and residual ratio thresholding with adaptation (RRTA) to operate OMP algorithm without the \textit{a priroi} knowledge of noise variance and signal sparsity and establish their HSC analytically and numerically. To the best of our knowledge, these are the first and only noise statistics oblivious algorithms to report HSC in underdetermined regression models.

MLSep 19, 2018
Noise Statistics Oblivious GARD For Robust Regression With Sparse Outliers

Sreejith Kallummil, Sheetal Kalyani

Linear regression models contaminated by Gaussian noise (inlier) and possibly unbounded sparse outliers are common in many signal processing applications. Sparse recovery inspired robust regression (SRIRR) techniques are shown to deliver high quality estimation performance in such regression models. Unfortunately, most SRIRR techniques assume \textit{a priori} knowledge of noise statistics like inlier noise variance or outlier statistics like number of outliers. Both inlier and outlier noise statistics are rarely known \textit{a priori} and this limits the efficient operation of many SRIRR algorithms. This article proposes a novel noise statistics oblivious algorithm called residual ratio thresholding GARD (RRT-GARD) for robust regression in the presence of sparse outliers. RRT-GARD is developed by modifying the recently proposed noise statistics dependent greedy algorithm for robust de-noising (GARD). Both finite sample and asymptotic analytical results indicate that RRT-GARD performs nearly similar to GARD with \textit{a priori} knowledge of noise statistics. Numerical simulations in real and synthetic data sets also point to the highly competitive performance of RRT-GARD.

MLSep 11, 2018
Structured and Unstructured Outlier Identification for Robust PCA: A Non iterative, Parameter free Algorithm

Vishnu Menon, Sheetal Kalyani

Robust PCA, the problem of PCA in the presence of outliers has been extensively investigated in the last few years. Here we focus on Robust PCA in the outlier model where each column of the data matrix is either an inlier or an outlier. Most of the existing methods for this model assumes either the knowledge of the dimension of the lower dimensional subspace or the fraction of outliers in the system. However in many applications knowledge of these parameters is not available. Motivated by this we propose a parameter free outlier identification method for robust PCA which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. Further, analytical guarantees are derived for outlier identification and the performance of the algorithm is compared with the existing state of the art methods in both real and synthetic data for various outlier structures.

MLJun 2, 2018
Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit

Sreejith Kallummil, Sheetal Kalyani

Orthogonal matching pursuit (OMP) is a widely used algorithm for recovering sparse high dimensional vectors in linear regression models. The optimal performance of OMP requires \textit{a priori} knowledge of either the sparsity of regression vector or noise statistics. Both these statistics are rarely known \textit{a priori} and are very difficult to estimate. In this paper, we present a novel technique called residual ratio thresholding (RRT) to operate OMP without any \textit{a priori} knowledge of sparsity and noise statistics and establish finite sample and large sample support recovery guarantees for the same. Both analytical results and numerical simulations in real and synthetic data sets indicate that RRT has a performance comparable to OMP with \textit{a priori} knowledge of sparsity and noise statistics.

ITApr 30, 2018
A Centralized Multi-stage Non-parametric Learning Algorithm for Opportunistic Spectrum Access

Thulasi Tholeti, Vishnu Raj, Sheetal Kalyani

Owing to the ever-increasing demand in wireless spectrum, Cognitive Radio (CR) was introduced as a technique to attain high spectral efficiency. As the number of secondary users (SUs) connecting to the cognitive radio network is on the rise, there is an imminent need for centralized algorithms that provide high throughput and energy efficiency of the SUs while ensuring minimum interference to the licensed users. In this work, we propose a multi-stage algorithm that - 1) effectively assigns the available channel to the SUs, 2) employs a non-parametric learning framework to estimate the primary traffic distribution to minimize sensing, and 3) proposes an adaptive framework to ensure that the collision to the primary user is below the specified threshold. We provide comprehensive empirical validation of the method with other approaches.

MLApr 13, 2018
Fast, Parameter free Outlier Identification for Robust PCA

Vishnu Menon, Sheetal Kalyani

Robust PCA, the problem of PCA in the presence of outliers has been extensively investigated in the last few years. Here we focus on Robust PCA in the column sparse outlier model. The existing methods for column sparse outlier model assumes either the knowledge of the dimension of the lower dimensional subspace or the fraction of outliers in the system. However in many applications knowledge of these parameters is not available. Motivated by this we propose a parameter free outlier identification method for robust PCA which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast. Further, analytical guarantees are derived for outlier identification and the performance of the algorithm is compared with the existing state of the art methods.

LGAug 5, 2017
An aggregating strategy for shifting experts in discrete sequence prediction

Vishnu Raj, Sheetal Kalyani

We study how we can adapt a predictor to a non-stationary environment with advises from multiple experts. We study the problem under complete feedback when the best expert changes over time from a decision theoretic point of view. Proposed algorithm is based on popular exponential weighing method with exponential discounting. We provide theoretical results bounding regret under the exponential discounting setting. Upper bound on regret is derived for finite time horizon problem. Numerical verification of different real life datasets are provided to show the utility of proposed algorithm.

MLAug 3, 2017
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems

Saishankar Katri Pulliyakode, Sheetal Kalyani

Wireless systems perform rate adaptation to transmit at highest possible instantaneous rates. Rate adaptation has been increasingly granular over generations of wireless systems. The base-station uses SINR and packet decode feedback called acknowledgement/no acknowledgement (ACK/NACK) to perform rate adaptation. SINR is used for rate anchoring called inner look adaptation and ACK/NACK is used for fine offset adjustments called Outer Loop Link Adaptation (OLLA). We cast the OLLA as a reinforcement learning problem of the class of Multi-Armed Bandits (MAB) where the different offset values are the arms of the bandit. In OLLA, as the offset values increase, the probability of packet error also increase, and every user equipment (UE) has a desired Block Error Rate (BLER) to meet certain Quality of Service (QoS) requirements. For this MAB we propose a binary search based algorithm which achieves a Probably Approximately Correct (PAC) solution making use of bounds from large deviation theory and confidence bounds. In addition to this we also discuss how a Thompson sampling or UCB based method will not help us meet the target objectives. Finally, simulation results are provided on an LTE system simulator and thereby prove the efficacy of our proposed algorithm.

ITJul 31, 2017
Spectrum Access In Cognitive Radio Using A Two Stage Reinforcement Learning Approach

Vishnu Raj, Irene Dias, Thulasi Tholeti et al.

With the advent of the 5th generation of wireless standards and an increasing demand for higher throughput, methods to improve the spectral efficiency of wireless systems have become very important. In the context of cognitive radio, a substantial increase in throughput is possible if the secondary user can make smart decisions regarding which channel to sense and when or how often to sense. Here, we propose an algorithm to not only select a channel for data transmission but also to predict how long the channel will remain unoccupied so that the time spent on channel sensing can be minimized. Our algorithm learns in two stages - a reinforcement learning approach for channel selection and a Bayesian approach to determine the optimal duration for which sensing can be skipped. Comparisons with other learning methods are provided through extensive simulations. We show that the number of sensing is minimized with negligible increase in primary interference; this implies that lesser energy is spent by the secondary user in sensing and also higher throughput is achieved by saving on sensing.

MLJul 31, 2017
Taming Non-stationary Bandits: A Bayesian Approach

Vishnu Raj, Sheetal Kalyani

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes' samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.

MLJul 27, 2017
Signal and Noise Statistics Oblivious Sparse Reconstruction using OMP/OLS

Sreejith Kallummil, Sheetal Kalyani

Orthogonal matching pursuit (OMP) and orthogonal least squares (OLS) are widely used for sparse signal reconstruction in under-determined linear regression problems. The performance of these compressed sensing (CS) algorithms depends crucially on the \textit{a priori} knowledge of either the sparsity of the signal ($k_0$) or noise variance ($σ^2$). Both $k_0$ and $σ^2$ are unknown in general and extremely difficult to estimate in under determined models. This limits the application of OMP and OLS in many practical situations. In this article, we develop two computationally efficient frameworks namely TF-IGP and RRT-IGP for using OMP and OLS even when $k_0$ and $σ^2$ are unavailable. Both TF-IGP and RRT-IGP are analytically shown to accomplish successful sparse recovery under the same set of restricted isometry conditions on the design matrix required for OMP/OLS with \textit{a priori} knowledge of $k_0$ and $σ^2$. Numerical simulations also indicate a highly competitive performance of TF-IGP and RRT-IGP in comparison to OMP/OLS with \textit{a priori} knowledge of $k_0$ and $σ^2$.

MLMar 15, 2017
Tuning Free Orthogonal Matching Pursuit

Sreejith Kallummil, Sheetal Kalyani

Orthogonal matching pursuit (OMP) is a widely used compressive sensing (CS) algorithm for recovering sparse signals in noisy linear regression models. The performance of OMP depends on its stopping criteria (SC). SC for OMP discussed in literature typically assumes knowledge of either the sparsity of the signal to be estimated $k_0$ or noise variance $σ^2$, both of which are unavailable in many practical applications. In this article we develop a modified version of OMP called tuning free OMP or TF-OMP which does not require a SC. TF-OMP is proved to accomplish successful sparse recovery under the usual assumptions on restricted isometry constants (RIC) and mutual coherence of design matrix. TF-OMP is numerically shown to deliver a highly competitive performance in comparison with OMP having \textit{a priori} knowledge of $k_0$ or $σ^2$. Greedy algorithm for robust de-noising (GARD) is an OMP like algorithm proposed for efficient estimation in classical overdetermined linear regression models corrupted by sparse outliers. However, GARD requires the knowledge of inlier noise variance which is difficult to estimate. We also produce a tuning free algorithm (TF-GARD) for efficient estimation in the presence of sparse outliers by extending the operating principle of TF-OMP to GARD. TF-GARD is numerically shown to achieve a performance comparable to that of the existing implementation of GARD.

MLMar 10, 2017
High SNR Consistent Compressive Sensing

Sreejith Kallummil, Sheetal Kalyani

High signal to noise ratio (SNR) consistency of model selection criteria in linear regression models has attracted a lot of attention recently. However, most of the existing literature on high SNR consistency deals with model order selection. Further, the limited literature available on the high SNR consistency of subset selection procedures (SSPs) is applicable to linear regression with full rank measurement matrices only. Hence, the performance of SSPs used in underdetermined linear models (a.k.a compressive sensing (CS) algorithms) at high SNR is largely unknown. This paper fills this gap by deriving necessary and sufficient conditions for the high SNR consistency of popular CS algorithms like $l_0$-minimization, basis pursuit de-noising or LASSO, orthogonal matching pursuit and Dantzig selector. Necessary conditions analytically establish the high SNR inconsistency of CS algorithms when used with the tuning parameters discussed in literature. Novel tuning parameters with SNR adaptations are developed using the sufficient conditions and the choice of SNR adaptations are discussed analytically using convergence rate analysis. CS algorithms with the proposed tuning parameters are numerically shown to be high SNR consistent and outperform existing tuning parameters in the moderate to high SNR regime.

APMar 6, 2014
Rate Prediction and Selection in LTE systems using Modified Source Encoding Techniques

K. P. Saishankar, Sheetal Kalyani, K. Narendran

In current wireless systems, the base-Station (eNodeB) tries to serve its user-equipment (UE) at the highest possible rate that the UE can reliably decode. The eNodeB obtains this rate information as a quantized feedback from the UE at time n and uses this, for rate selection till the next feedback is received at time n + δ. The feedback received at n can become outdated before n + δ, because of a) Doppler fading, and b) Change in the set of active interferers for a UE. Therefore rate prediction becomes essential. Since, the rates belong to a discrete set, we propose a discrete sequence prediction approach, wherein, frequency trees for the discrete sequences are built using source encoding algorithms like Prediction by Partial Match (PPM). Finding the optimal depth of the frequency tree used for prediction is cast as a model order selection problem. The rate sequence complexity is analysed to provide an upper bound on model order. Information-theoretic criteria are then used to solve the model order problem. Finally, two prediction algorithms are proposed, using the PPM with optimal model order and system level simulations demonstrate the improvement in packet loss and throughput due to these algorithms.