LGMay 8, 2022
Hamiltonian Monte Carlo Particle Swarm OptimizerOmatharv Bharat Vaidya, Rithvik Terence DSouza, Snehanshu Saha et al.
We introduce the Hamiltonian Monte Carlo Particle Swarm Optimizer (HMC-PSO), an optimization algorithm that reaps the benefits of both Exponentially Averaged Momentum PSO and HMC sampling. The coupling of the position and velocity of each particle with Hamiltonian dynamics in the simulation allows for extensive freedom for exploration and exploitation of the search space. It also provides an excellent technique to explore highly non-convex functions while ensuring efficient sampling. We extend the method to approximate error gradients in closed form for Deep Neural Network (DNN) settings. We discuss possible methods of coupling and compare its performance to that of state-of-the-art optimizers on the Golomb's Ruler problem and Classification tasks.
LGNov 3, 2023
DeliverAI: Reinforcement Learning Based Distributed Path-Sharing Network for Food DeliveriesAshman Mehra, Snehanshu Saha, Vaskar Raychoudhury et al.
Delivery of items from the producer to the consumer has experienced significant growth over the past decade and has been greatly fueled by the recent pandemic. Amazon Fresh, Shopify, UberEats, InstaCart, and DoorDash are rapidly growing and are sharing the same business model of consumer items or food delivery. Existing food delivery methods are sub-optimal because each delivery is individually optimized to go directly from the producer to the consumer via the shortest time path. We observe a significant scope for reducing the costs associated with completing deliveries under the current model. We model our food delivery problem as a multi-objective optimization, where consumer satisfaction and delivery costs, both, need to be optimized. Taking inspiration from the success of ride-sharing in the taxi industry, we propose DeliverAI - a reinforcement learning-based path-sharing algorithm. Unlike previous attempts for path-sharing, DeliverAI can provide real-time, time-efficient decision-making using a Reinforcement learning-enabled agent system. Our novel agent interaction scheme leverages path-sharing among deliveries to reduce the total distance traveled while keeping the delivery completion time under check. We generate and test our methodology vigorously on a simulation setup using real data from the city of Chicago. Our results show that DeliverAI can reduce the delivery fleet size by 12\%, the distance traveled by 13%, and achieve 50% higher fleet utilization compared to the baselines.
LGFeb 17, 2023
Quantile LSTM: A Robust LSTM for Anomaly Detection In Time Series DataSnehanshu Saha, Jyotirmoy Sarkar, Soma Dhavala et al.
Anomalies refer to the departure of systems and devices from their normal behaviour in standard operating conditions. An anomaly in an industrial device can indicate an upcoming failure, often in the temporal direction. In this paper, we make two contributions: 1) we estimate conditional quantiles and consider three different ways to define anomalies based on the estimated quantiles. 2) we use a new learnable activation function in the popular Long Short Term Memory networks (LSTM) architecture to model temporal long-range dependency. In particular, we propose Parametric Elliot Function (PEF) as an activation function (AF) inside LSTM, which saturates lately compared to sigmoid and tanh. The proposed algorithms are compared with other well-known anomaly detection algorithms, such as Isolation Forest (iForest), Elliptic Envelope, Autoencoder, and modern Deep Learning models such as Deep Autoencoding Gaussian Mixture Model (DAGMM), Generative Adversarial Networks (GAN). The algorithms are evaluated in terms of various performance metrics, such as Precision and Recall. The algorithms have been tested on multiple industrial time-series datasets such as Yahoo, AWS, GE, and machine sensors. We have found that the LSTM-based quantile algorithms are very effective and outperformed the existing algorithms in identifying anomalies.
IMSep 9, 2022
Investigation of a Machine learning methodology for the SKA pulsar search pipelineShashank Sanjay Bhat, Thiagaraj Prabu, Ben Stappers et al.
The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects.
3.6MAMar 24
Multi-Agent Training-free Urban Food Delivery System using Resilient UMST NetworkMd Nahid Hasan, Vishwam Tiwari, Aditya Challa et al.
Delivery systems have become a core part of urban life, supporting the demand for food, medicine, and other goods. Yet traditional logistics networks remain fragile, often struggling to adapt to road closures, accidents, and shifting demand. Online Food Delivery (OFD) platforms now represent a cornerstone of urban logistics, with the global market projected to grow to over 500 billion USD by 2030. Designing delivery networks that are efficient and resilient remains a major challenge: fully connected graphs provide flexibility but are computationally infeasible at scale, while single Minimum Spanning Trees (MSTs) are efficient but easily disrupted. We propose the Union of Minimum Spanning Trees (UMST) approach to construct delivery networks that are sparse yet robust. UMST generates multiple MSTs through randomized edge perturbations and unites them, producing graphs with far fewer edges than fully connected networks while maintaining multiple alternative routes between delivery hotspots. Across multiple U.S. cities, UMST achieves 20--40$\times$ fewer edges than fully connected graphs while enabling substantial order bundling with 75--83% participation rates. Compared to learning-based baselines including MADDPG and Graph Neural Networks, UMST delivers competitive performance (88-96% success rates, 44-53% distance savings) without requiring training, achieving 30$\times$ faster execution while maintaining interpretable routing structures. Its combination of structural efficiency and operational flexibility offers a scalable and resilient foundation for urban delivery networks.
LGAug 19, 2023
To prune or not to prune : A chaos-causality approach to principled pruning of dense neural networksRajan Sahu, Shivam Chadha, Nithin Nagaraj et al.
Reducing the size of a neural network (pruning) by removing weights without impacting its performance is an important problem for resource-constrained devices. In the past, pruning was typically accomplished by ranking or penalizing weights based on criteria like magnitude and removing low-ranked weights before retraining the remaining ones. Pruning strategies may also involve removing neurons from the network in order to achieve the desired reduction in network size. We formulate pruning as an optimization problem with the objective of minimizing misclassifications by selecting specific weights. To accomplish this, we have introduced the concept of chaos in learning (Lyapunov exponents) via weight updates and exploiting causality to identify the causal weights responsible for misclassification. Such a pruned network maintains the original performance and retains feature explainability.
LGJun 19, 2022
LogGENE: A smooth alternative to check loss for Deep Healthcare Inference TasksAryaman Jeendgar, Tanmay Devale, Soma S Dhavala et al.
Mining large datasets and obtaining calibrated predictions from tem is of immediate relevance and utility in reliable deep learning. In our work, we develop methods for Deep neural networks based inferences in such datasets like the Gene Expression. However, unlike typical Deep learning methods, our inferential technique, while achieving state-of-the-art performance in terms of accuracy, can also provide explanations, and report uncertainty estimates. We adopt the Quantile Regression framework to predict full conditional quantiles for a given set of housekeeping gene expressions. Conditional quantiles, in addition to being useful in providing rich interpretations of the predictions, are also robust to measurement noise. Our technique is particularly consequential in High-throughput Genomics, an area which is ushering a new era in personalized health care, and targeted drug design and delivery. However, check loss, used in quantile regression to drive the estimation process is not differentiable. We propose log-cosh as a smooth-alternative to the check loss. We apply our methods on GEO microarray dataset. We also extend the method to binary classification setting. Furthermore, we investigate other consequences of the smoothness of the loss in faster convergence. We further apply the classification framework to other healthcare inference tasks such as heart disease, breast cancer, diabetes etc. As a test of generalization ability of our framework, other non-healthcare related data sets for regression and classification tasks are also evaluated.
LGApr 25, 2023
QuantProb: Generalizing Probabilities along with Predictions for a Pre-trained ClassifierAditya Challa, Snehanshu Saha, Soma Dhavala
Quantification of Uncertainty in predictions is a challenging problem. In the classification settings, although deep learning based models generalize well, class probabilities often lack reliability. Calibration errors are used to quantify uncertainty, and several methods exist to minimize calibration error. We argue that between the choice of having a minimum calibration error on original distribution which increases across distortions or having a (possibly slightly higher) calibration error which is constant across distortions, we prefer the latter We hypothesize that the reason for unreliability of deep networks is - The way neural networks are currently trained, the probabilities do not generalize across small distortions. We observe that quantile based approaches can potentially solve this problem. We propose an innovative approach to decouple the construction of quantile representations from the loss function allowing us to compute quantile based probabilities without disturbing the original network. We achieve this by establishing a novel duality property between quantiles and probabilities, and an ability to obtain quantile probabilities from any pre-trained classifier. While post-hoc calibration techniques successfully minimize calibration errors, they do not preserve robustness to distortions. We show that, Quantile probabilities (QuantProb), obtained from Quantile representations, preserve the calibration errors across distortions, since quantile probabilities generalize better than the naive Softmax probabilities.
LGApr 7, 2023
Correcting Model Misspecification via Generative Adversarial NetworksPronoma Banerjee, Manasi V Gude, Rajvi J Sampat et al.
Machine learning models are often misspecified in the likelihood, which leads to a lack of robustness in the predictions. In this paper, we introduce a framework for correcting likelihood misspecifications in several paradigm agnostic noisy prior models and test the model's ability to remove the misspecification. The "ABC-GAN" framework introduced is a novel generative modeling paradigm, which combines Generative Adversarial Networks (GANs) and Approximate Bayesian Computation (ABC). This new paradigm assists the existing GANs by incorporating any subjective knowledge available about the modeling process via ABC, as a regularizer, resulting in a partially interpretable model that operates well under low data regimes. At the same time, unlike any Bayesian analysis, the explicit knowledge need not be perfect, since the generator in the GAN can be made arbitrarily complex. ABC-GAN eliminates the need for summary statistics and distance metrics as the discriminator implicitly learns them and enables simultaneous specification of multiple generative models. The model misspecification is simulated in our experiments by introducing noise of various biases and variances. The correction term is learnt via the ABC-GAN, with skip connections, referred to as skipGAN. The strength of the skip connection indicates the amount of correction needed or how misspecified the prior model is. Based on a simple experimental setup, we show that the ABC-GAN models not only correct the misspecification of the prior, but also perform as well as or better than the respective priors under noisier conditions. In this proposal, we show that ABC-GANs get the best of both worlds.
BMNov 2, 2023
A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parametersSourabh Patil, Archana Mathur, Raviprasad Aduri et al.
RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at the residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions between pseudouridine synthase and RNA. The existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features such as mono and dinucleotide composition/propensities of RNA sequences. Predicting pseudouridine sites is a non-linear classification problem with limited data points. Deep Learning models are efficient discriminators when the data set size is reasonably large and fail when there is a paucity of data ($<1000$ samples). To mitigate this problem, we propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics, and using data-driven parameters (i.e. MSU) as features. For this purpose, we have used position-specific tri/quad/pentanucleotide composition/propensity (PSPC/PSPP) besides nucleotide and dineculeotide composition as features. SVMs are known to work well in small data regimes and kernels in SVM are designed to classify non-linear data. The proposed model outperforms the existing state-of-the-art models significantly (10%-15% on average).
LGJan 16
Matching High-Dimensional Geometric Quantiles for Test-Time Adaptation of Transformers and Convolutional Networks AlikeSravan Danda, Aditya Challa, Shlok Mehendale et al.
Test-time adaptation (TTA) refers to adapting a classifier for the test data when the probability distribution of the test data slightly differs from that of the training data of the model. To the best of our knowledge, most of the existing TTA approaches modify the weights of the classifier relying heavily on the architecture. It is unclear as to how these approaches are extendable to generic architectures. In this article, we propose an architecture-agnostic approach to TTA by adding an adapter network pre-processing the input images suitable to the classifier. This adapter is trained using the proposed quantile loss. Unlike existing approaches, we correct for the distribution shift by matching high-dimensional geometric quantiles. We prove theoretically that under suitable conditions minimizing quantile loss can learn the optimal adapter. We validate our approach on CIFAR10-C, CIFAR100-C and TinyImageNet-C by training both classic convolutional and transformer networks on CIFAR10, CIFAR100 and TinyImageNet datasets.
LGDec 4, 2024
A Granger-Causal Perspective on Gradient Descent with Application to PruningAditya Shah, Aditya Challa, Sravan Danda et al.
Stochastic Gradient Descent (SGD) is the main approach to optimizing neural networks. Several generalization properties of deep networks, such as convergence to a flatter minima, are believed to arise from SGD. This article explores the causality aspect of gradient descent. Specifically, we show that the gradient descent procedure has an implicit granger-causal relationship between the reduction in loss and a change in parameters. By suitable modifications, we make this causal relationship explicit. A causal approach to gradient descent has many significant applications which allow greater control. In this article, we illustrate the significance of the causal approach using the application of Pruning. The causal approach to pruning has several interesting properties - (i) We observe a phase shift as the percentage of pruned parameters increase. Such phase shift is indicative of an optimal pruning strategy. (ii) After pruning, we see that minima becomes "flatter", explaining the increase in accuracy after pruning weights.
LGFeb 11, 2024
Benchmarking Anomaly Detection Algorithms: Deep Learning and BeyondShanay Mehta, Shlok Mehendale, Nicole Fernandes et al.
Detection of anomalous situations for complex mission-critical systems hold paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of Machine Learning (ML)-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical ML, including various tree-based approaches to Deep Learning (DL) and outlier detection methods. The inclusion of 104 publicly available enhances the diversity of the study, allowing a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper evaluates the general notion of DL as a universal solution, showing that, while powerful, it is not always the best fit for every scenario. The findings reveal that recently proposed tree-based evolutionary algorithms match DL methods and sometimes outperform them in many instances of univariate data where the size of the data is small and number of anomalies are less than 10%. Specifically, tree-based approaches successfully detect singleton anomalies in datasets where DL falls short. To the best of the authors' knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.
MAOct 15, 2025
Altruistic Ride Sharing: A Community-Driven Approach to Short-Distance MobilityDivyanshu Singh, Ashman Mehra, Snehanshu Saha et al.
Urban mobility faces persistent challenges of congestion and fuel consumption, specifically when people choose a private, point-to-point commute option. Profit-driven ride-sharing platforms prioritize revenue over fairness and sustainability. This paper introduces Altruistic Ride-Sharing (ARS), a decentralized, peer-to-peer mobility framework where participants alternate between driver and rider roles based on altruism points rather than monetary incentives. The system integrates multi-agent reinforcement learning (MADDPG) for dynamic ride-matching, game-theoretic equilibrium guarantees for fairness, and a population model to sustain long-term balance. Using real-world New York City taxi data, we demonstrate that ARS reduces travel distance and emissions, increases vehicle utilization, and promotes equitable participation compared to both no-sharing and optimization-based baselines. These results establish ARS as a scalable, community-driven alternative to conventional ride-sharing, aligning individual behavior with collective urban sustainability goals.
LGFeb 25, 2025
A Radon-Nikodým Perspective on Anomaly Detection: Theory and ImplicationsShlok Mehendale, Aditya Challa, Rahul Yedida et al.
Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of PAC (Probably Approximately Correct) learnability. Depending on the context a Radon-Nikodým derivative takes different forms. In the simplest case of supervised anomaly detection, Radon-Nikodým derivative takes the form of a simple weighted loss. In the case of unsupervised anomaly detection (with distributional assumptions), Radon-Nikodým derivative takes the form of the popular cluster based local outlier factor. We evaluate our algorithm on 96 datasets, including univariate and multivariate data from diverse domains, including healthcare, cybersecurity, and finance. We show that RN-Derivative algorithms outperform state-of-the-art methods on 68% of Multivariate datasets (based on F1 scores) and also achieves peak F1-scores on 72% of time series (Univariate) datasets.
LGMay 19, 2024
Quantile Activation: Correcting a Failure Mode of ML ModelsAditya Challa, Sravan Danda, Laurent Najman et al.
Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we propose a simple activation function, quantile activation (QAct), that addresses this problem without significantly increasing computational costs. The core idea is to "adapt" the outputs of each neuron to its context distribution. The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution, diverging from the direct numerical outputs common in traditional networks. A specific case of the above failure mode is when there is an inherent distribution shift, i.e the test distribution differs slightly from the train distribution. We validate the proposed activation function under covariate shifts, using datasets designed to test robustness against distortions. Our results demonstrate significantly better generalization across distortions compared to conventional classifiers and other adaptive methods, across various architectures. Although this paper presents a proof of concept, we find that this approach unexpectedly outperforms DINOv2 (small), despite DINOv2 being trained with a much larger network and dataset.
LGFeb 7, 2024
Strong convexity-guided hyper-parameter optimization for flatter lossesRahul Yedida, Snehanshu Saha
We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss and its flatness. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the strong convexity of the loss. By using the structure of the underlying neural network, we derive closed-form equations to approximate the strong convexity parameter, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.
EPSep 6, 2021
Postulating Exoplanetary Habitability via a Novel Anomaly Detection MethodJyotirmoy Sarkar, Kartik Bhatia, Snehanshu Saha et al.
A profound shift in the study of cosmology came with the discovery of thousands of exoplanets and the possibility of the existence of billions of them in our Galaxy. The biggest goal in these searches is whether there are other life-harbouring planets. However, the question which of these detected planets are habitable, potentially-habitable, or maybe even inhabited, is still not answered. Some potentially habitable exoplanets have been hypothesized, but since Earth is the only known habitable planet, measures of habitability are necessarily determined with Earth as the reference. Several recent works introduced new habitability metrics based on optimization methods. Classification of potentially habitable exoplanets using supervised learning is another emerging area of study. However, both modeling and supervised learning approaches suffer from drawbacks. We propose an anomaly detection method, the Multi-Stage Memetic Algorithm (MSMA), to detect anomalies and extend it to an unsupervised clustering algorithm MSMVMCA to use it to detect potentially habitable exoplanets as anomalies. The algorithm is based on the postulate that Earth is an anomaly, with the possibility of existence of few other anomalies among thousands of data points. We describe an MSMA-based clustering approach with a novel distance function to detect habitable candidates as anomalies (including Earth). The results are cross-matched with the habitable exoplanet catalog (PHL-HEC) of the Planetary Habitability Laboratory (PHL) with both optimistic and conservative lists of potentially habitable exoplanets.
LGApr 10, 2021
A Swarm Variant for the Schrödinger SolverUrvil Nileshbhai Jivani, Omatharv Bharat Vaidya, Anwesh Bhattacharya et al.
This paper introduces application of the Exponentially Averaged Momentum Particle Swarm Optimization (EM-PSO) as a derivative-free optimizer for Neural Networks. It adopts PSO's major advantages such as search space exploration and higher robustness to local minima compared to gradient-descent optimizers such as Adam. Neural network based solvers endowed with gradient optimization are now being used to approximate solutions to Differential Equations. Here, we demonstrate the novelty of EM-PSO in approximating gradients and leveraging the property in solving the Schrödinger equation, for the Particle-in-a-Box problem. We also provide the optimal set of hyper-parameters supported by mathematical proofs, suited for our algorithm.
NEApr 10, 2021
Fairly Constricted Multi-Objective Particle Swarm OptimizationAnwesh Bhattacharya, Snehanshu Saha, Nithin Nagaraj
It has been well documented that the use of exponentially-averaged momentum (EM) in particle swarm optimization (PSO) is advantageous over the vanilla PSO algorithm. In the single-objective setting, it leads to faster convergence and avoidance of local minima. Naturally, one would expect that the same advantages of EM carry over to the multi-objective setting. Hence, we extend the state of the art Multi-objective optimization (MOO) solver, SMPSO, by incorporating EM in it. As a consequence, we develop the mathematical formalism of constriction fairness which is at the core of extended SMPSO algorithm. The proposed solver matches the performance of SMPSO across the ZDT, DTLZ and WFG problem suites and even outperforms it in certain instances.
LGFeb 9, 2021
Estimation and Applications of Quantiles in Deep Binary ClassificationAnuj Tambwekar, Anirudh Maiya, Soma Dhavala et al.
Quantile regression, based on check loss, is a widely used inferential paradigm in Econometrics and Statistics. The conditional quantiles provide a robust alternative to classical conditional means, and also allow uncertainty quantification of the predictions, while making very few distributional assumptions. We consider the analogue of check loss in the binary classification setting. We assume that the conditional quantiles are smooth functions that can be learnt by Deep Neural Networks (DNNs). Subsequently, we compute the Lipschitz constant of the proposed loss, and also show that its curvature is bounded, under some regularity conditions. Consequently, recent results on the error rates and DNN architecture complexity become directly applicable. We quantify the uncertainty of the class probabilities in terms of prediction intervals, and develop individualized confidence scores that can be used to decide whether a prediction is reliable or not at scoring time. By aggregating the confidence scores at the dataset level, we provide two additional metrics, model confidence, and retention rate, to complement the widely used classifier summaries. We also the robustness of the proposed non-parametric binary quantile classification framework are also studied, and we demonstrate how to obtain several univariate summary statistics of the conditional distributions, in particular conditional means, using smoothed conditional quantiles, allowing the use of explanation techniques like Shapley to explain the mean predictions. Finally, we demonstrate an efficient training regime for this loss based on Stochastic Gradient Descent with Lipschitz Adaptive Learning Rates (LALR).
GANov 23, 2020
Automated Detection of Double Nuclei Galaxies using GOTHIC and the Discovery of a Large Sample of Dual AGNAnwesh Bhattacharya, Nehal C. P., Mousumi Das et al.
We present a novel algorithm to detect double nuclei galaxies (DNG) called GOTHIC (Graph BOosted iterated HIll Climbing) - that detects whether a given image of a galaxy has two or more closely separated nuclei. Our aim is to detect samples of dual or multiple active galactic nuclei (AGN) in galaxies. Although galaxy mergers are common, the detection of dual AGN is rare. Their detection is very important as they help us understand the formation of supermassive black hole (SMBH) binaries, SMBH growth and AGN feedback effects in multiple nuclei systems. There is thus a need for an algorithm to do a systematic survey of existing imaging data for the discovery of DNGs and dual AGN. We have tested GOTHIC on a known sample of DNGs and subsequently applied it to a sample of a million SDSS DR16 galaxies lying in the redshift range of 0 to 0.75 approximately, and have available spectroscopic data. We have detected 159 dual AGN in this sample, of which 2 are triple AGN systems. Our results show that dual AGN are not common, and triple AGN even rarer. The color (u-r) magnitude plots of the DNGs indicate that star formation is quenched as the nuclei come closer and as the AGN fraction increases. The quenching is especially prominent for dual/triple AGN galaxies that lie in the extreme end of the red sequence.
NEMay 19, 2020
AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm IntelligenceRohan Mohapatra, Snehanshu Saha, Carlos A. Coello Coello et al.
This paper introduces AdaSwarm, a novel gradient-free optimizer which has similar or even better performance than the Adam optimizer adopted in neural networks. In order to support our proposed AdaSwarm, a novel Exponentially weighted Momentum Particle Swarm Optimizer (EMPSO), is proposed. The ability of AdaSwarm to tackle optimization problems is attributed to its capability to perform good gradient approximations. We show that, the gradient of any function, differentiable or not, can be approximated by using the parameters of EMPSO. This is a novel technique to simulate GD which lies at the boundary between numerical methods and swarm intelligence. Mathematical proofs of the gradient approximation produced are also provided. AdaSwarm competes closely with several state-of-the-art (SOTA) optimizers. We also show that AdaSwarm is able to handle a variety of loss functions during backpropagation, including the maximum absolute error (MAE).
LGMay 19, 2020
LALR: Theoretical and Experimental validation of Lipschitz Adaptive Learning Rate in Regression and Neural NetworksSnehanshu Saha, Tejas Prashanth, Suraj Aralihalli et al.
We propose a theoretical framework for an adaptive learning rate policy for the Mean Absolute Error loss function and Quantile loss function and evaluate its effectiveness for regression tasks. The framework is based on the theory of Lipschitz continuity, specifically utilizing the relationship between learning rate and Lipschitz constant of the loss function. Based on experimentation, we have found that the adaptive learning rate policy enables up to 20x faster convergence compared to a constant learning rate policy.
LGMay 18, 2020
Parsimonious Computing: A Minority Training Regime for Effective Prediction in Large Microarray Expression Data SetsShailesh Sridhar, Snehanshu Saha, Azhar Shaikh et al.
Rigorous mathematical investigation of learning rates used in back-propagation in shallow neural networks has become a necessity. This is because experimental evidence needs to be endorsed by a theoretical background. Such theory may be helpful in reducing the volume of experimental effort to accomplish desired results. We leveraged the functional property of Mean Square Error, which is Lipschitz continuous to compute learning rate in shallow neural networks. We claim that our approach reduces tuning efforts, especially when a significant corpus of data has to be handled. We achieve remarkable improvement in saving computational cost while surpassing prediction accuracy reported in literature. The learning rate, proposed here, is the inverse of the Lipschitz constant. The work results in a novel method for carrying out gene expression inference on large microarray data sets with a shallow architecture constrained by limited computing resources. A combination of random sub-sampling of the dataset, an adaptive Lipschitz constant inspired learning rate and a new activation function, A-ReLU helped accomplish the results reported in the paper.
LGOct 6, 2019
ChaosNet: A Chaos based Artificial Neural Network Architecture for ClassificationHarikrishnan Nellippallil Balakrishnan, Aditi Kathpalia, Snehanshu Saha et al.
Inspired by chaotic firing of neurons in the brain, we propose ChaosNet -- a novel chaos based artificial neural network architecture for classification tasks. ChaosNet is built using layers of neurons, each of which is a 1D chaotic map known as the Generalized Luroth Series (GLS) which has been shown in earlier works to possess very useful properties for compression, cryptography and for computing XOR and other logical operations. In this work, we design a novel learning algorithm on ChaosNet that exploits the topological transitivity property of the chaotic GLS neurons. The proposed learning algorithm gives consistently good performance accuracy in a number of classification tasks on well known publicly available datasets with very limited training samples. Even with as low as 7 (or fewer) training samples/class (which accounts for less than 0.05% of the total available data), ChaosNet yields performance accuracies in the range 73.89 % - 98.33 %. We demonstrate the robustness of ChaosNet to additive parameter noise and also provide an example implementation of a 2-layer ChaosNet for enhancing classification accuracy. We envisage the development of several other novel learning algorithms on ChaosNet in the near future.
IMJun 1, 2019
Evolution of Novel Activation Functions in Neural Network Training with Applications to Classification of ExoplanetsSnehanshu Saha, Nithin Nagaraj, Archana Mathur et al.
We present analytical exploration of novel activation functions as consequence of integration of several ideas leading to implementation and subsequent use in habitability classification of exoplanets. Neural networks, although a powerful engine in supervised methods, often require expensive tuning efforts for optimized performance. Habitability classes are hard to discriminate, especially when attributes used as hard markers of separation are removed from the data set. The solution is approached from the point of investigating analytical properties of the proposed activation functions. The theory of ordinary differential equations and fixed point are exploited to justify the "lack of tuning efforts" to achieve optimal performance compared to traditional activation functions. Additionally, the relationship between the proposed activation functions and the more popular ones is established through extensive analytical and empirical evidence. Finally, the activation functions have been implemented in plain vanilla feed-forward neural network to classify exoplanets.
LGFeb 20, 2019
LipschitzLR: Using theoretically computed adaptive learning rates for fast convergenceRahul Yedida, Snehanshu Saha, Tejas Prashanth
Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of these to tune, and this has gained more attention in recent works. In this paper, we propose a novel method to compute the learning rate for training deep neural networks with stochastic gradient descent. We first derive a theoretical framework to compute learning rates dynamically based on the Lipschitz constant of the loss function. We then extend this framework to other commonly used optimization algorithms, such as gradient descent with momentum and Adam. We run an extensive set of experiments that demonstrate the efficacy of our approach on popular architectures and datasets, and show that commonly used learning rates are an order of magnitude smaller than the ideal value.
LGJun 6, 2018
SBAF: A New Activation Function for Artificial Neural Net based Habitability ClassificationSnehanshu Saha, Archana Mathur, Kakoli Bora et al.
We explore the efficacy of using a novel activation function in Artificial Neural Networks (ANN) in characterizing exoplanets into different classes. We call this Saha-Bora Activation Function (SBAF) as the motivation is derived from long standing understanding of using advanced calculus in modeling habitability score of Exoplanets. The function is demonstrated to possess nice analytical properties and doesn't seem to suffer from local oscillation problems. The manuscript presents the analytical properties of the activation function and the architecture implemented on the function. Keywords: Astroinformatics, Machine Learning, Exoplanets, ANN, Activation Function.
IMApr 13, 2018
Machine Learning in Astronomy: A Case Study in Quasar-Star ClassificationMohammed Viquar, Suryoday Basak, Ariruna Dasgupta et al.
We present the results of various automated classification methods, based on machine learning (ML), of objects from data releases 6 and 7 (DR6 and DR7) of the Sloan Digital Sky Survey (SDSS), primarily distinguishing stars from quasars. We provide a careful scrutiny of approaches available in the literature and have highlighted the pitfalls in those approaches based on the nature of data used for the study. The aim is to investigate the appropriateness of the application of certain ML methods. The manuscript argues convincingly in favor of the efficacy of asymmetric AdaBoost to classify photometric data. The paper presents a critical review of existing study and puts forward an application of asymmetric AdaBoost, as an offspring of that exercise.
LGApr 29, 2016
Predicting the direction of stock market prices using random forestLuckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey
Predicting trends in stock market prices has been an area of interest for researchers for many years due to its complex and dynamic nature. Intrinsic volatility in stock market across the globe makes the task of prediction challenging. Forecasting and diffusion modeling, although effective can't be the panacea to the diverse range of problems encountered in prediction, short-term or otherwise. Market risk, strongly correlated with forecasting errors, needs to be minimized to ensure minimal risk in investment. The authors propose to minimize forecasting error by treating the forecasting problem as a classification problem, a popular suite of algorithms in Machine learning. In this paper, we propose a novel way to minimize the risk of investment in stock market by predicting the returns of a stock using a class of powerful machine learning algorithms known as ensemble learning. Some of the technical indicators such as Relative Strength Index (RSI), stochastic oscillator etc are used as inputs to train our model. The learning model used is an ensemble of multiple decision trees. The algorithm is shown to outperform existing algo- rithms found in the literature. Out of Bag (OOB) error estimates have been found to be encouraging. Key Words: Random Forest Classifier, stock price forecasting, Exponential smoothing, feature extraction, OOB error and convergence.
CEApr 29, 2015
ASTROMLSKIT: A New Statistical Machine Learning Toolkit: A Platform for Data Analytics in AstronomySnehanshu Saha, Surbhi Agrawal, Manikandan. R et al.
Astroinformatics is a new impact area in the world of astronomy, occasionally called the final frontier, where several astrophysicists, statisticians and computer scientists work together to tackle various data intensive astronomical problems. Exponential growth in the data volume and increased complexity of the data augments difficult questions to the existing challenges. Classical problems in Astronomy are compounded by accumulation of astronomical volume of complex data, rendering the task of classification and interpretation incredibly laborious. The presence of noise in the data makes analysis and interpretation even more arduous. Machine learning algorithms and data analytic techniques provide the right platform for the challenges posed by these problems. A diverse range of open problem like star-galaxy separation, detection and classification of exoplanets, classification of supernovae is discussed. The focus of the paper is the applicability and efficacy of various machine learning algorithms like K Nearest Neighbor (KNN), random forest (RF), decision tree (DT), Support Vector Machine (SVM), Naïve Bayes and Linear Discriminant Analysis (LDA) in analysis and inference of the decision theoretic problems in Astronomy. The machine learning algorithms, integrated into ASTROMLSKIT, a toolkit developed in the course of the work, have been used to analyze HabCat data and supernovae data. Accuracy has been found to be appreciably good.
CRNov 26, 2013
A Randomized Generic Lucas Seed Algorithm (RGLSA) with Tail Boosting for Threat Modeling in Virtual MachinesSnehanshu Saha, Bidisha Goswami, Alexander Ngenzi et al.
The paper is about a self-propagating and self-replicating model of malicious seeds.