Swagatam Das

LG
h-index23
59papers
725citations
Novelty49%
AI Score55

59 Papers

NEAug 25, 2023
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities

Yanjie Song, Yutong Wu, Yangyang Guo et al.

Evolutionary algorithms (EA), a class of stochastic search methods based on the principles of natural evolution, have received widespread acclaim for their exceptional performance in various real-world optimization problems. While researchers worldwide have proposed a wide variety of EAs, certain limitations remain, such as slow convergence speed and poor generalization capabilities. Consequently, numerous scholars actively explore improvements to algorithmic structures, operators, search patterns, etc., to enhance their optimization performance. Reinforcement learning (RL) integrated as a component in the EA framework has demonstrated superior performance in recent years. This paper presents a comprehensive survey on integrating reinforcement learning into the evolutionary algorithm, referred to as reinforcement learning-assisted evolutionary algorithm (RL-EA). We begin with the conceptual outlines of reinforcement learning and the evolutionary algorithm. We then provide a taxonomy of RL-EA. Subsequently, we discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. The RL-assisted procedure is divided according to the implemented functions including solution generation, learnable objective function, algorithm/operator/sub-population selection, parameter adaptation, and other strategies. Additionally, different attribute settings of RL in RL-EA are discussed. In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets to facilitate a quick comparative study. Finally, we analyze potential directions for future research.

CVJun 5, 2022
GridShift: A Faster Mode-seeking Algorithm for Image Segmentation and Object Tracking

Abhishek Kumar, Oladayo S. Ajani, Swagatam Das et al.

In machine learning and computer vision, mean shift (MS) qualifies as one of the most popular mode-seeking algorithms used for clustering and image segmentation. It iteratively moves each data point to the weighted mean of its neighborhood data points. The computational cost required to find the neighbors of each data point is quadratic to the number of data points. Consequently, the vanilla MS appears to be very slow for large-scale datasets. To address this issue, we propose a mode-seeking algorithm called GridShift, with significant speedup and principally based on MS. To accelerate, GridShift employs a grid-based approach for neighbor search, which is linear in the number of data points. In addition, GridShift moves the active grid cells (grid cells associated with at least one data point) in place of data points towards the higher density, a step that provides more speedup. The runtime of GridShift is linear in the number of active grid cells and exponential in the number of features. Therefore, it is ideal for large-scale low-dimensional applications such as object tracking and image segmentation. Through extensive experiments, we showcase the superior performance of GridShift compared to other MS-based as well as state-of-the-art algorithms in terms of accuracy and runtime on benchmark datasets for image segmentation. Finally, we provide a new object-tracking algorithm based on GridShift and show promising results for object tracking compared to CamShift and meanshift++.

SPSep 16, 2024Code
Self-Tuning Spectral Clustering for Speaker Diarization

Nikhil Raghav, Avisek Gupta, Md Sahidullah et al.

Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called spectral clustering on p-neighborhood retained affinity matrix (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top p % similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches. Our implementations are available at https://github.com/nikhilraghav29/SC-pNA.

LGMay 8, 2022
Hamiltonian Monte Carlo Particle Swarm Optimizer

Omatharv Bharat Vaidya, Rithvik Terence DSouza, Snehanshu Saha et al.

We introduce the Hamiltonian Monte Carlo Particle Swarm Optimizer (HMC-PSO), an optimization algorithm that reaps the benefits of both Exponentially Averaged Momentum PSO and HMC sampling. The coupling of the position and velocity of each particle with Hamiltonian dynamics in the simulation allows for extensive freedom for exploration and exploitation of the search space. It also provides an excellent technique to explore highly non-convex functions while ensuring efficient sampling. We extend the method to approximate error gradients in closed form for Deep Neural Network (DNN) settings. We discuss possible methods of coupling and compare its performance to that of state-of-the-art optimizers on the Golomb's Ruler problem and Classification tasks.

CVSep 1, 2022
Hybrid Gromov-Wasserstein Embedding for Capsule Learning

Pourya Shamsolmoali, Masoumeh Zareapoor, Swagatam Das et al.

Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relations using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep CNN variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared to high-performing convolution models. Our contribution can be outlined in two aspects: firstly, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the Hybrid Gromov-Wasserstein framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport. This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: (i) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; (ii) it outperforms baseline approaches in these demanding tasks.

CLDec 8, 2025Code
HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs

Sujoy Nath, Arkaprabha Basu, Sharanya Dasgupta et al.

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language understanding tasks. While these models often produce linguistically coherent output, they often suffer from hallucinations, generating descriptions that are factually inconsistent with the visual content, potentially leading to adverse consequences. Therefore, the assessment of hallucinations in MLLM has become increasingly crucial in the model development process. Contemporary methodologies predominantly depend on external LLM evaluators, which are themselves susceptible to hallucinations and may present challenges in terms of domain adaptation. In this study, we propose the hypothesis that hallucination manifests as measurable irregularities within the internal layer dynamics of MLLMs, not merely due to distributional shifts but also in the context of layer-wise analysis of specific assumptions. By incorporating such modifications, \textsc{\textsc{HalluShift++}} broadens the efficacy of hallucination detection from text-based large language models (LLMs) to encompass multimodal scenarios. Our codebase is available at https://github.com/C0mRD/HalluShift_Plus.

LGApr 7, 2022
Interval Bound Interpolation for Few-shot Learning with Few Tasks

Shounak Datta, Sankha Subhra Mullick, Anish Chakrabarty et al.

Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks from the same task distribution with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. This becomes more difficult when only a limited number of tasks are available for training. In such a few-task few-shot setting, it is beneficial to explicitly preserve the local neighborhoods from the task manifold and exploit this to generate artificial tasks for training. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains compared to current methods.

ASSep 16, 2024Code
TCG CREST System Description for the Second DISPLACE Challenge

Nikhil Raghav, Subhajit Saha, Md Sahidullah et al.

In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios. We investigated different speech enhancement techniques, voice activity detection (VAD) techniques, unsupervised domain categorization, and neural embedding extraction architectures. We also exploited the fusion of various embedding extraction models. We implemented our system with the open-source SpeechBrain toolkit. Our final submissions use spectral clustering for both the speaker and language diarization. We achieve about $7\%$ relative improvement over the challenge baseline in Track 1. We did not obtain improvement over the challenge baseline in Track 2.

SDMar 21, 2024Code
Exploring Green AI for Audio Deepfake Detection

Subhajit Saha, Md Sahidullah, Swagatam Das

The state-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance. Nonetheless, this advantage is accompanied by a significant carbon footprint. This is mainly due to the use of high-performance computing with accelerators and high training time. Studies show that average deep NLP model produces around 626k lbs of CO\textsubscript{2} which is equivalent to five times of average US car emission at its lifetime. This is certainly a massive threat to the environment. To tackle this challenge, this study presents a novel framework for audio deepfake detection that can be seamlessly trained using standard CPU resources. Our proposed framework utilizes off-the-shelve self-supervised learning (SSL) based models which are pre-trained and available in public repositories. In contrast to existing methods that fine-tune SSL models and employ additional deep neural networks for downstream tasks, we exploit classical machine learning algorithms such as logistic regression and shallow neural networks using the SSL embeddings extracted using the pre-trained model. Our approach shows competitive results compared to the commonly used high-carbon footprint approaches. In experiments with the ASVspoof 2019 LA dataset, we achieve a 0.90\% equal error rate (EER) with less than 1k trainable model parameters. To encourage further research in this direction and support reproducible results, the Python code will be made publicly accessible following acceptance. Github: https://github.com/sahasubhajit/Speech-Spoofing-

MLNov 12, 2025
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

Sourav De, Koustav Chowdhury, Bibhabasu Mandal et al.

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.

MLNov 7, 2025
A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

Shubhayan Pan, Saptarshi Chakraborty, Debolina Paul et al.

Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges them. Despite its advantages, this method can fail when dealing with data exhibiting linearly non-separable or non-convex structures. To mitigate the limitations, we propose a kernelized extension of the convex clustering method. This approach projects the data points into a Reproducing Kernel Hilbert Space (RKHS) using a feature map, enabling convex clustering in this transformed space. This kernelization not only allows for better handling of complex data distributions but also produces an embedding in a finite-dimensional vector space. We provide a comprehensive theoretical underpinnings for our kernelized approach, proving algorithmic convergence and establishing finite sample bounds for our estimates. The effectiveness of our method is demonstrated through extensive experiments on both synthetic and real-world datasets, showing superior performance compared to state-of-the-art clustering techniques. This work marks a significant advancement in the field, offering an effective solution for clustering in non-linear and non-convex data scenarios.

MLNov 26, 2023
Dirichlet Process-based Robust Clustering using the Median-of-Means Estimator

Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul et al.

Clustering stands as one of the most prominent challenges in unsupervised machine learning. Among centroid-based methods, the classic $k$-means algorithm, based on Lloyd's heuristic, is widely used. Nonetheless, it is a well-known fact that $k$-means and its variants face several challenges, including heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When data contains noise or outliers, the Median-of-Means (MoM) estimator offers a robust alternative for stabilizing centroid-based methods. On a different note, another limitation in many commonly used clustering methods is the need to specify the number of clusters beforehand. Model-based approaches, such as Bayesian nonparametric models, address this issue by incorporating infinite mixture models, which eliminate the requirement for predefined cluster counts. Motivated by these facts, in this article, we propose an efficient and automatic clustering technique by integrating the strengths of model-based and centroid-based methodologies. Our method mitigates the effect of noise on the quality of clustering; while at the same time, estimates the number of clusters. Statistical guarantees on an upper bound of clustering error, and rigorous assessment through simulated and real datasets, suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.

CLApr 13, 2025Code
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs

Sharanya Dasgupta, Sujoy Nath, Arkaprabha Basu et al.

Large Language Models (LLMs) have recently garnered widespread attention due to their adeptness at generating innovative responses to the given prompts across a multitude of domains. However, LLMs often suffer from the inherent limitation of hallucinations and generate incorrect information while maintaining well-structured and coherent responses. In this work, we hypothesize that hallucinations stem from the internal dynamics of LLMs. Our observations indicate that, during passage generation, LLMs tend to deviate from factual accuracy in subtle parts of responses, eventually shifting toward misinformation. This phenomenon bears a resemblance to human cognition, where individuals may hallucinate while maintaining logical coherence, embedding uncertainty within minor segments of their speech. To investigate this further, we introduce an innovative approach, HalluShift, designed to analyze the distribution shifts in the internal state space and token probabilities of the LLM-generated responses. Our method attains superior performance compared to existing baselines across various benchmark datasets. Our codebase is available at https://github.com/sharanya-dasgupta001/hallushift.

CLJan 7Code
ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models

Sharanya Dasgupta, Arkaprabha Basu, Sujoy Nath et al.

Human cognition, driven by complex neurochemical processes, oscillates between imagination and reality and learns to self-correct whenever such subtle drifts lead to hallucinations or unsafe associations. In recent years, LLMs have demonstrated remarkable performance in a wide range of tasks. However, they still lack human cognition to balance factuality and safety. Bearing the resemblance, we argue that both factual and safety failures in LLMs arise from a representational misalignment in their latent activation space, rather than addressing those as entirely separate alignment issues. We hypothesize that an external network, trained to understand the fluctuations, can selectively intervene in the model to regulate falsehood into truthfulness and unsafe output into safe output without fine-tuning the model parameters themselves. Reflecting the hypothesis, we propose ARREST (Adversarial Resilient Regulation Enhancing Safety and Truth), a unified framework that identifies and corrects drifted features, engaging both soft and hard refusals in addition to factual corrections. Our empirical results show that ARREST not only regulates misalignment but is also more versatile compared to the RLHF-aligned models in generating soft refusals due to adversarial training. We make our codebase available at https://github.com/sharanya-dasgupta001/ARREST.

LGJun 16, 2025Code
Assessing the Limits of In-Context Learning beyond Functions using Partially Ordered Relation

Debanjan Dutta, Faizanuddin Ansari, Swagatam Das

Generating rational and generally accurate responses to tasks, often accompanied by example demonstrations, highlights Large Language Model's (LLM's) remarkable In-Context Learning (ICL) capabilities without requiring updates to the model's parameter space. Despite having an ongoing exploration focused on the inference from a document-level concept, its behavior in learning well-defined functions or relations in context needs a careful investigation. In this article, we present the performance of ICL on partially ordered relation by introducing the notion of inductively increasing complexity in prompts. In most cases, the saturated performance of the chosen metric indicates that while ICL offers some benefits, its effectiveness remains constrained as we increase the complexity in the prompts even in presence of sufficient demonstrative examples. The behavior is evident from our empirical findings and has further been theoretically justified in term of its implicit optimization process. The code is available \href{https://anonymous.4open.science/r/ICLonPartiallyOrderSet}{here}.

QUANT-PHMar 25
A Longitudinal Analysis of the CEC Single-Objective Competitions (2010-2024) and Implications for Variational Quantum Optimization

Vojtěch Novák, Tomáš Bezděk, Ivan Zelinka et al.

This paper provides a historical analysis of the IEEE CEC Single Objective Optimization competition results (2010-2024). We analyze how benchmark functions shaped winning algorithms, identifying the 2014 introduction of dense rotation matrices as a key performance filter. This design choice introduced parameter non-separability, reduced effectiveness of coordinate-dependent methods (PSO, GA), and established the dominance of Differential Evolution variants capable of preserving the rotational invariance of their difference vectors, specifically L-SHADE. Post-2020 analysis reveals a shift towards high complexity hybrid optimizers that combine different mechanisms (e.g., Eigenvector Crossover, Societal Sharing, Reinforcement Learning) to maximize ranking stability. We conclude by identifying structural similarities between these modern benchmarks and Variational Quantum Algorithm landscapes, suggesting that evolved CEC solvers possess the specific adaptive capabilities required for quantum control.

LGSep 14, 2024
Consistent Spectral Clustering in Hyperbolic Spaces

Sagar Ghosh, Swagatam Das

Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.

IVMar 24, 2024
Enhancing MRI-Based Classification of Alzheimer's Disease with Explainable 3D Hybrid Compact Convolutional Transformers

Arindam Majee, Avisek Gupta, Sourav Raha et al.

Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patterns crucial for AD identification. Addressing this challenge, we introduce an alternative end-to-end deep learning model, the 3D Hybrid Compact Convolutional Transformers 3D (HCCT). By synergistically combining convolutional neural networks (CNNs) and vision transformers (ViTs), the 3D HCCT adeptly captures both local features and long-range relationships within 3D MRI scans. Extensive evaluations on prominent AD benchmark dataset, ADNI, demonstrate the 3D HCCT's superior performance, surpassing state of the art CNN and transformer-based methods in classification accuracy. Its robust generalization capability and interpretability marks a significant stride in AD classification from 3D MRI scans, promising more accurate and reliable diagnoses for improved patient care and superior clinical outcomes.

LGDec 12, 2025
Hyperbolic Gaussian Blurring Mean Shift: A Statistical Mode-Seeking Framework for Clustering in Curved Spaces

Arghya Pratihar, Arnab Seal, Swagatam Das et al.

Clustering is a fundamental unsupervised learning task for uncovering patterns in data. While Gaussian Blurring Mean Shift (GBMS) has proven effective for identifying arbitrarily shaped clusters in Euclidean space, it struggles with datasets exhibiting hierarchical or tree-like structures. In this work, we introduce HypeGBMS, a novel extension of GBMS to hyperbolic space. Our method replaces Euclidean computations with hyperbolic distances and employs Möbius-weighted means to ensure that all updates remain consistent with the geometry of the space. HypeGBMS effectively captures latent hierarchies while retaining the density-seeking behavior of GBMS. We provide theoretical insights into convergence and computational complexity, along with empirical results that demonstrate improved clustering quality in hierarchical datasets. This work bridges classical mean-shift clustering and hyperbolic representation learning, offering a principled approach to density-based clustering in curved spaces. Extensive experimental evaluations on $11$ real-world datasets demonstrate that HypeGBMS significantly outperforms conventional mean-shift clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.

SDApr 27, 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements

Sandipan Dhar, Nanda Dulal Jana, Swagatam Das

Voice conversion (VC) stands as a crucial research area in speech synthesis, enabling the transformation of a speaker's vocal characteristics to resemble another while preserving the linguistic content. This technology has broad applications, including automated movie dubbing, speech-to-singing conversion, and assistive devices for pathological speech rehabilitation. With the increasing demand for high-quality and natural-sounding synthetic voices, researchers have developed a wide range of VC techniques. Among these, generative adversarial network (GAN)-based approaches have drawn considerable attention for their powerful feature-mapping capabilities and potential to produce highly realistic speech. Despite notable advancements, challenges such as ensuring training stability, maintaining linguistic consistency, and achieving perceptual naturalness continue to hinder progress in GAN-based VC systems. This systematic review presents a comprehensive analysis of the voice conversion landscape, highlighting key techniques, key challenges, and the transformative impact of GANs in the field. The survey categorizes existing methods, examines technical obstacles, and critically evaluates recent developments in GAN-based VC. By consolidating and synthesizing research findings scattered across the literature, this review provides a structured understanding of the strengths and limitations of different approaches. The significance of this survey lies in its ability to guide future research by identifying existing gaps, proposing potential directions, and offering insights for building more robust and efficient VC systems. Overall, this work serves as an essential resource for researchers, developers, and practitioners aiming to advance the state-of-the-art (SOTA) in voice conversion technology.

LGJun 19, 2025
A Free Probabilistic Framework for Analyzing the Transformer-based Language Models

Swagatam Das

We present a formal operator-theoretic framework for analyzing Transformer-based language models using free probability theory. By modeling token embeddings and attention mechanisms as self-adjoint operators in a tracial \( W^* \)-probability space, we reinterpret attention as non-commutative convolution and describe representation propagation via free additive convolution. This leads to a spectral dynamic system interpretation of deep Transformers. We derive entropy-based generalization bounds under freeness assumptions and provide insight into positional encoding, spectral evolution, and representational complexity. This work offers a principled, though theoretical, perspective on structural dynamics in large language models.

LGDec 11, 2023
HyPE-GT: where Graph Transformers meet Hyperbolic Positional Encodings

Kushal Bose, Swagatam Das

Graph Transformers (GTs) facilitate the comprehension of graph-structured data by calculating the self-attention of node pairs without considering node position information. To address this limitation, we introduce an innovative and efficient framework that introduces Positional Encodings (PEs) into the Transformer, generating a set of learnable positional encodings in the hyperbolic space, a non-Euclidean domain. This approach empowers us to explore diverse options for optimal selection of PEs for specific downstream tasks, leveraging hyperbolic neural networks or hyperbolic graph convolutional networks. Additionally, we repurpose these positional encodings to mitigate the impact of over-smoothing in deep Graph Neural Networks (GNNs). Comprehensive experiments on molecular benchmark datasets, co-author, and co-purchase networks substantiate the effectiveness of hyperbolic positional encodings in enhancing the performance of deep GNNs.

MLNov 15, 2024
On the Universal Statistical Consistency of Expansive Hyperbolic Deep Convolutional Neural Networks

Sagar Ghosh, Kushal Bose, Swagatam Das

The emergence of Deep Convolutional Neural Networks (DCNNs) has been a pervasive tool for accomplishing widespread applications in computer vision. Despite its potential capability to capture intricate patterns inside the data, the underlying embedding space remains Euclidean and primarily pursues contractive convolution. Several instances can serve as a precedent for the exacerbating performance of DCNNs. The recent advancement of neural networks in the hyperbolic spaces gained traction, incentivizing the development of convolutional deep neural networks in the hyperbolic space. In this work, we propose Hyperbolic DCNN based on the Poincaré Disc. The work predominantly revolves around analyzing the nature of expansive convolution in the context of the non-Euclidean domain. We further offer extensive theoretical insights pertaining to the universal consistency of the expansive convolution in the hyperbolic space. Several simulations were performed not only on the synthetic datasets but also on some real-world datasets. The experimental results reveal that the hyperbolic convolutional architecture outperforms the Euclidean ones by a commendable margin.

IVApr 9, 2024
Fortifying Fully Convolutional Generative Adversarial Networks for Image Super-Resolution Using Divergence Measures

Arkaprabha Basu, Kushal Bose, Sankha Subhra Mullick et al.

Super-Resolution (SR) is a time-hallowed image processing problem that aims to improve the quality of a Low-Resolution (LR) sample up to the standard of its High-Resolution (HR) counterpart. We aim to address this by introducing Super-Resolution Generator (SuRGe), a fully-convolutional Generative Adversarial Network (GAN)-based architecture for SR. We show that distinct convolutional features obtained at increasing depths of a GAN generator can be optimally combined by a set of learnable convex weights to improve the quality of generated SR samples. In the process, we employ the Jensen-Shannon and the Gromov-Wasserstein losses respectively between the SR-HR and LR-SR pairs of distributions to further aid the generator of SuRGe to better exploit the available information in an attempt to improve SR. Moreover, we train the discriminator of SuRGe with the Wasserstein loss with gradient penalty, to primarily prevent mode collapse. The proposed SuRGe, as an end-to-end GAN workflow tailor-made for super-resolution, offers improved performance while maintaining low inference time. The efficacy of SuRGe is substantiated by its superior performance compared to 18 state-of-the-art contenders on 10 benchmark datasets.

LGAug 26, 2025
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMs

Supratik Sarkar, Swagatam Das

Hallucinations in LLMs--especially in multimodal settings--undermine reliability. We present a rigorous, information-geometric framework in diffusion dynamics that quantifies hallucination in MLLMs: model outputs are embedded spectrally on multimodal graph Laplacians, and gaps to a truth manifold define a semantic-distortion metric. We derive Courant--Fischer bounds on a temperature-dependent hallucination energy and use RKHS eigenmodes to obtain modality-aware, interpretable measures that track evolution over prompts and time. This reframes hallucination as measurable and bounded, providing a principled basis for evaluation and mitigation.

LGAug 3, 2025
MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model

Asmit Bandyopadhyay, Rohit Basu, Tanmay Sen et al.

Human Activity Recognition (HAR) plays a vital role in applications such as fitness tracking, smart homes, and healthcare monitoring. Traditional HAR systems often rely on single modalities, such as motion sensors or cameras, limiting robustness and accuracy in real-world environments. This work presents FedTime-MAGNET, a novel multimodal federated learning framework that advances HAR by combining heterogeneous data sources: depth cameras, pressure mats, and accelerometers. At its core is the Multimodal Adaptive Graph Neural Expert Transformer (MAGNET), a fusion architecture that uses graph attention and a Mixture of Experts to generate unified, discriminative embeddings across modalities. To capture complex temporal dependencies, a lightweight T5 encoder only architecture is customized and adapted within this framework. Extensive experiments show that FedTime-MAGNET significantly improves HAR performance, achieving a centralized F1 Score of 0.934 and a strong federated F1 Score of 0.881. These results demonstrate the effectiveness of combining multimodal fusion, time series LLMs, and federated learning for building accurate and robust HAR systems.

SDApr 18, 2025
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion

Sandipan Dhar, Md. Tousin Akhter, Nanda Dulal Jana et al.

After demonstrating significant success in image synthesis, Generative Adversarial Network (GAN) models have likewise made significant progress in the field of speech synthesis, leveraging their capacity to adapt the precise distribution of target data through adversarial learning processes. Notably, in the realm of State-Of-The-Art (SOTA) GAN-based Voice Conversion (VC) models, there exists a substantial disparity in naturalness between real and GAN-generated speech samples. Furthermore, while many GAN models currently operate on a single generator discriminator learning approach, optimizing target data distribution is more effectively achievable through a single generator multi-discriminator learning scheme. Hence, this study introduces a novel GAN model named Collective Learning Mechanism-based Optimal Transport GAN (CLOT-GAN) model, incorporating multiple discriminators, including the Deep Convolutional Neural Network (DCNN) model, Vision Transformer (ViT), and conformer. The objective of integrating various discriminators lies in their ability to comprehend the formant distribution of mel-spectrograms, facilitated by a collective learning mechanism. Simultaneously, the inclusion of Optimal Transport (OT) loss aims to precisely bridge the gap between the source and target data distribution, employing the principles of OT theory. The experimental validation on VCC 2018, VCTK, and CMU-Arctic datasets confirms that the CLOT-GAN-VC model outperforms existing VC models in objective and subjective assessments.

PROct 26, 2025
A Free Probabilistic Framework for Denoising Diffusion Models: Entropy, Transport, and Reverse Processes

Swagatam Das

This paper develops a rigorous probabilistic framework that extends denoising diffusion models to the setting of noncommutative random variables. Building on Voiculescu's theory of free entropy and free Fisher information, we formulate diffusion and reverse processes governed by operator-valued stochastic dynamics whose spectral measures evolve by additive convolution. Using tools from free stochastic analysis -- including a Malliavin calculus and a Clark--Ocone representation -- we derive the reverse-time stochastic differential equation driven by the conjugate variable, the analogue of the classical score function. The resulting dynamics admit a gradient-flow structure in the noncommutative Wasserstein space, establishing an information-geometric link between entropy production, transport, and deconvolution. We further construct a variational scheme analogous to the Jordan--Kinderlehrer--Otto (JKO) formulation and prove convergence toward the semicircular equilibrium. The framework provides functional inequalities (free logarithmic Sobolev, Talagrand, and HWI) that quantify entropy dissipation and Wasserstein contraction. These results unify diffusion-based generative modeling with the geometry of operator-valued information, offering a mathematical foundation for generative learning on structured and high-dimensional data.

LGOct 10, 2025
Rebalancing with Calibrated Sub-classes (RCS): A Statistical Fusion-based Framework for Robust Imbalanced Classification across Modalities

Priyobrata Mondal, Faizanuddin Ansari, Swagatam Das

Class imbalance, where certain classes have insufficient data, poses a critical challenge for robust classification, often biasing models toward majority classes. Distribution calibration offers a promising avenue to address this by estimating more accurate class distributions. In this work, we propose Rebalancing with Calibrated Sub-classes (RCS) - a novel distribution calibration framework for robust imbalanced classification. RCS aims to fuse statistical information from the majority and intermediate class distributions via a weighted mixture of Gaussian components to estimate minority class parameters more accurately. An encoder-decoder network is trained to preserve structural relationships in imbalanced datasets and prevent feature disentanglement. Post-training, encoder-extracted feature vectors are leveraged to generate synthetic samples guided by the calibrated distributions. This fusion-based calibration effectively mitigates overgeneralization by incorporating neighborhood distribution information rather than relying solely on majority-class statistics. Extensive experiments on diverse image, text, and tabular datasets demonstrate that RCS consistently outperforms several baseline and state-of-the-art methods, highlighting its effectiveness and broad applicability in addressing real-world imbalanced classification challenges.

LGSep 17, 2025
APFEx: Adaptive Pareto Front Explorer for Intersectional Fairness

Priyobrata Mondal, Faizanuddin Ansari, Swagatam Das

Ensuring fairness in machine learning models is critical, especially when biases compound across intersecting protected attributes like race, gender, and age. While existing methods address fairness for single attributes, they fail to capture the nuanced, multiplicative biases faced by intersectional subgroups. We introduce Adaptive Pareto Front Explorer (APFEx), the first framework to explicitly model intersectional fairness as a joint optimization problem over the Cartesian product of sensitive attributes. APFEx combines three key innovations- (1) an adaptive multi-objective optimizer that dynamically switches between Pareto cone projection, gradient weighting, and exploration strategies to navigate fairness-accuracy trade-offs, (2) differentiable intersectional fairness metrics enabling gradient-based optimization of non-smooth subgroup disparities, and (3) theoretical guarantees of convergence to Pareto-optimal solutions. Experiments on four real-world datasets demonstrate APFEx's superiority, reducing fairness violations while maintaining competitive accuracy. Our work bridges a critical gap in fair ML, providing a scalable, model-agnostic solution for intersectional fairness.

LGSep 16, 2025
Learning from Heterophilic Graphs: A Spectral Theory Perspective on the Impact of Self-Loops and Parallel Edges

Kushal Bose, Swagatam Das

Graph heterophily poses a formidable challenge to the performance of Message-passing Graph Neural Networks (MP-GNNs). The familiar low-pass filters like Graph Convolutional Networks (GCNs) face performance degradation, which can be attributed to the blending of the messages from dissimilar neighboring nodes. The performance of the low-pass filters on heterophilic graphs still requires an in-depth analysis. In this context, we update the heterophilic graphs by adding a number of self-loops and parallel edges. We observe that eigenvalues of the graph Laplacian decrease and increase respectively by increasing the number of self-loops and parallel edges. We conduct several studies regarding the performance of GCN on various benchmark heterophilic networks by adding either self-loops or parallel edges. The studies reveal that the GCN exhibited either increasing or decreasing performance trends on adding self-loops and parallel edges. In light of the studies, we established connections between the graph spectra and the performance trends of the low-pass filters on the heterophilic graphs. The graph spectra characterize the essential intrinsic properties of the input graph like the presence of connected components, sparsity, average degree, cluster structures, etc. Our work is adept at seamlessly evaluating graph spectrum and properties by observing the performance trends of the low-pass filters without pursuing the costly eigenvalue decomposition. The theoretical foundations are also discussed to validate the impact of adding self-loops and parallel edges on the graph spectrum.

LGSep 8, 2025
Asynchronous Message Passing for Addressing Oversquashing in Graph Neural Networks

Kushal Bose, Swagatam Das

Graph Neural Networks (GNNs) suffer from Oversquashing, which occurs when tasks require long-range interactions. The problem arises from the presence of bottlenecks that limit the propagation of messages among distant nodes. Recently, graph rewiring methods modify edge connectivity and are expected to perform well on long-range tasks. Yet, graph rewiring compromises the inductive bias, incurring significant information loss in solving the downstream task. Furthermore, increasing channel capacity may overcome information bottlenecks but enhance the parameter complexity of the model. To alleviate these shortcomings, we propose an efficient model-agnostic framework that asynchronously updates node features, unlike traditional synchronous message passing GNNs. Our framework creates node batches in every layer based on the node centrality values. The features of the nodes belonging to these batches will only get updated. Asynchronous message updates process information sequentially across layers, avoiding simultaneous compression into fixed-capacity channels. We also theoretically establish that our proposed framework maintains higher feature sensitivity bounds compared to standard synchronous approaches. Our framework is applied to six standard graph datasets and two long-range datasets to perform graph classification and achieves impressive performances with a $5\%$ and $4\%$ improvements on REDDIT-BINARY and Peptides-struct, respectively.

MLJul 17, 2025
Relation-Aware Slicing in Cross-Domain Alignment

Dhruv Sarkar, Aprameyo Chakrabartty, Anish Chakrabarty et al.

The Sliced Gromov-Wasserstein (SGW) distance, aiming to relieve the computational cost of solving a non-convex quadratic program that is the Gromov-Wasserstein distance, utilizes projecting directions sampled uniformly from unit hyperspheres. This slicing mechanism incurs unnecessary computational costs due to uninformative directions, which also affects the representative power of the distance. However, finding a more appropriate distribution over the projecting directions (slicing distribution) is often an optimization problem in itself that comes with its own computational cost. In addition, with more intricate distributions, the sampling itself may be expensive. As a remedy, we propose an optimization-free slicing distribution that provides fast sampling for the Monte Carlo approximation. We do so by introducing the Relation-Aware Projecting Direction (RAPD), effectively capturing the pairwise association of each of two pairs of random vectors, each following their ambient law. This enables us to derive the Relation-Aware Slicing Distribution (RASD), a location-scale law corresponding to sampled RAPDs. Finally, we introduce the RASGW distance and its variants, e.g., IWRASGW (Importance Weighted RASGW), which overcome the shortcomings experienced by SGW. We theoretically analyze its properties and substantiate its empirical prowess using extensive experiments on various alignment tasks.

LGJun 23, 2025
On the Existence of Universal Simulators of Attention

Debanjan Dutta, Faizanuddin Ansari, Anish Chakrabarty et al.

Prior work on the learnability of transformers has established its capacity to approximate specific algorithmic patterns through training under restrictive architectural assumptions. Fundamentally, these arguments remain data-driven and therefore can only provide a probabilistic guarantee. Expressivity, on the contrary, has theoretically been explored to address the problems \emph{computable} by such architecture. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can transformer architectures exactly simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to identically replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. Our proofs, for the first time, show the existence of an algorithmically achievable data-agnostic solution, previously known to be approximated only by learning.

LGMay 30, 2025
Transformers Are Universally Consistent

Sagar Ghosh, Kushal Bose, Swagatam Das

Despite their central role in the success of foundational models and large-scale language modeling, the theoretical foundations governing the operation of Transformers remain only partially understood. Contemporary research has largely focused on their representational capacity for language comprehension and their prowess in in-context learning, frequently under idealized assumptions such as linearized attention mechanisms. Initially conceived to model sequence-to-sequence transformations, a fundamental and unresolved question is whether Transformers can robustly perform functional regression over sequences of input tokens. This question assumes heightened importance given the inherently non-Euclidean geometry underlying real-world data distributions. In this work, we establish that Transformers equipped with softmax-based nonlinear attention are uniformly consistent when tasked with executing Ordinary Least Squares (OLS) regression, provided both the inputs and outputs are embedded in hyperbolic space. We derive deterministic upper bounds on the empirical error which, in the asymptotic regime, decay at a provable rate of $\mathcal{O}(t^{-1/2d})$, where $t$ denotes the number of input tokens and $d$ the embedding dimensionality. Notably, our analysis subsumes the Euclidean setting as a special case, recovering analogous convergence guarantees parameterized by the intrinsic dimensionality of the data manifold. These theoretical insights are corroborated through empirical evaluations on real-world datasets involving both continuous and categorical response variables.

LGMay 7, 2025
Topology-Driven Clustering: Enhancing Performance with Betti Number Filtration

Arghya Pratihar, Kushal Bose, Swagatam Das

Clustering aims to form groups of similar data points in an unsupervised regime. Yet, clustering complex datasets containing critically intertwined shapes poses significant challenges. The prevailing clustering algorithms widely depend on evaluating similarity measures based on Euclidean metrics. Exploring topological characteristics to perform clustering of complex datasets inevitably presents a better scope. The topological clustering algorithms predominantly perceive the point set through the lens of Simplicial complexes and Persistent homology. Despite these approaches, the existing topological clustering algorithms cannot somehow fully exploit topological structures and show inconsistent performances on some highly complicated datasets. This work aims to mitigate the limitations by identifying topologically similar neighbors through the Vietoris-Rips complex and Betti number filtration. In addition, we introduce the concept of the Betti sequences to capture flexibly essential features from the topological structures. Our proposed algorithm is adept at clustering complex, intertwined shapes contained in the datasets. We carried out experiments on several synthetic and real-world datasets. Our algorithm demonstrated commendable performances across the datasets compared to some of the well-known topology-based clustering algorithms.

LGMay 7, 2025
Hyperbolic Fuzzy C-Means with Adaptive Weight-based Filtering for Efficient Clustering

Swagato Das, Arghya Pratihar, Swagatam Das

Clustering algorithms play a pivotal role in unsupervised learning by identifying and grouping similar objects based on shared characteristics. Although traditional clustering techniques, such as hard and fuzzy center-based clustering, have been widely used, they struggle with complex, high-dimensional, and non-Euclidean datasets. In particular, the fuzzy $C$-Means (FCM) algorithm, despite its efficiency and popularity, exhibits notable limitations in non-Euclidean spaces. Euclidean spaces assume linear separability and uniform distance scaling, limiting their effectiveness in capturing complex, hierarchical, or non-Euclidean structures in fuzzy clustering. To overcome these challenges, we introduce Filtration-based Hyperbolic Fuzzy C-Means (HypeFCM), a novel clustering algorithm tailored for better representation of data relationships in non-Euclidean spaces. HypeFCM integrates the principles of fuzzy clustering with hyperbolic geometry and employs a weight-based filtering mechanism to improve performance. The algorithm initializes weights using a Dirichlet distribution and iteratively refines cluster centroids and membership assignments based on a hyperbolic metric in the Poincaré Disc model. Extensive experimental evaluations on $6$ synthetic and $12$ real-world datasets demonstrate that HypeFCM significantly outperforms conventional fuzzy clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.

MLDec 20, 2024
On Robust Cross Domain Alignment

Anish Chakrabarty, Arkaprabha Basu, Swagatam Das

The Gromov-Wasserstein (GW) distance is an effective measure of alignment between distributions supported on distinct ambient spaces. Calculating essentially the mutual departure from isometry, it has found vast usage in domain translation and network analysis. It has long been shown to be vulnerable to contamination in the underlying measures. All efforts to introduce robustness in GW have been inspired by similar techniques in optimal transport (OT), which predominantly advocate partial mass transport or unbalancing. In contrast, the cross-domain alignment problem being fundamentally different from OT, demands specific solutions to tackle diverse applications and contamination regimes. Deriving from robust statistics, we discuss three contextually novel techniques to robustify GW and its variants. For each method, we explore metric properties and robustness guarantees along with their co-dependencies and individual relations with the GW distance. For a comprehensive view, we empirically validate their superior resilience to contamination under real machine learning tasks against state-of-the-art methods.

LGMar 31, 2024
Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning

Srinjoy Roy, Swagatam Das

Accounting for the uncertainty of value functions boosts exploration in Reinforcement Learning (RL). Our work introduces Maximum Mean Discrepancy Q-Learning (MMD-QL) to improve Wasserstein Q-Learning (WQL) for uncertainty propagation during Temporal Difference (TD) updates. MMD-QL uses the MMD barycenter for this purpose, as MMD provides a tighter estimate of closeness between probability measures than the Wasserstein distance. Firstly, we establish that MMD-QL is Probably Approximately Correct in MDP (PAC-MDP) under the average loss metric. Concerning the accumulated rewards, experiments on tabular environments show that MMD-QL outperforms WQL and other algorithms. Secondly, we incorporate deep networks into MMD-QL to create MMD Q-Network (MMD-QN). Making reasonable assumptions, we analyze the convergence rates of MMD-QN using function approximation. Empirical results on challenging Atari games demonstrate that MMD-QN performs well compared to benchmark deep RL algorithms, highlighting its effectiveness in handling large state-action spaces.

MLDec 11, 2023
Concurrent Density Estimation with Wasserstein Autoencoders: Some Statistical Insights

Anish Chakrabarty, Arkaprabha Basu, Swagatam Das

Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new samples resembling the ones it was catered to. In the process, they aim to achieve a target latent representation of the encoded data. Our work is an attempt to offer a theoretical understanding of the machinery behind WAEs. From a statistical viewpoint, we pose the problem as concurrent density estimation tasks based on neural network-induced transformations. This allows us to establish deterministic upper bounds on the realized errors WAEs commit. We also analyze the propagation of these stochastic errors in the presence of adversaries. As a result, both the large sample properties of the reconstructed distribution and the resilience of WAE models are explored.

MLJan 6, 2022
Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification

Saptarshi Chakraborty, Debolina Paul, Swagatam Das

The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusions that underpin each individual contribution may no longer be valid. To meet these challenges coherently, in this study, we offer a unified robust framework that includes a broad variety of linear prediction problems on a Hilbert space, coupled with a generic class of loss functions. Notably, we do not require any assumptions on the distribution of the outlying data points ($\mathcal{O}$) nor the compactness of the support of the inlying ones ($\mathcal{I}$). Under mild conditions on the dual norm, we show that for misspecification level $ε$, these estimators achieve an error rate of $O(\max\left\{|\mathcal{O}|^{1/2}n^{-1/2}, |\mathcal{I}|^{1/2}n^{-1} \right\}+ε)$, matching the best-known rates in literature. This rate is slightly slower than the classical rates of $O(n^{-1/2})$, indicating that we need to pay a price in terms of error rates to obtain robust estimates. Additionally, we show that this rate can be improved to achieve so-called "fast rates" under additional assumptions.

MLOct 27, 2021
Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

Debolina Paul, Saptarshi Chakraborty, Swagatam Das et al.

Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.

MLOct 8, 2021
Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency

Anish Chakrabarty, Swagatam Das

The introduction of Variational Autoencoders (VAE) has been marked as a breakthrough in the history of representation learning models. Besides having several accolades of its own, VAE has successfully flagged off a series of inventions in the form of its immediate successors. Wasserstein Autoencoder (WAE), being an heir to that realm carries with it all of the goodness and heightened generative promises, matching even the generative adversarial networks (GANs). Needless to say, recent years have witnessed a remarkable resurgence in statistical analyses of the GANs. Similar examinations for Autoencoders, however, despite their diverse applicability and notable empirical performance, remain largely absent. To close this gap, in this paper, we investigate the statistical properties of WAE. Firstly, we provide statistical guarantees that WAE achieves the target distribution in the latent space, utilizing the Vapnik Chervonenkis (VC) theory. The main result, consequently ensures the regeneration of the input distribution, harnessing the potential offered by Optimal Transport of measures under the Wasserstein metric. This study, in turn, hints at the class of distributions WAE can reconstruct after suffering a compression in the form of a latent law.

SDApr 25, 2021
An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

Sandipan Dhar, Nanda Dulal Jana, Swagatam Das

Voice Conversion (VC) emerged as a significant domain of research in the field of speech synthesis in recent years due to its emerging application in voice-assisting technology, automated movie dubbing, and speech-to-singing conversion to name a few. VC basically deals with the conversion of vocal style of one speaker to another speaker while keeping the linguistic contents unchanged. VC task is performed through a three-stage pipeline consisting of speech analysis, speech feature mapping, and speech reconstruction. Nowadays the Generative Adversarial Network (GAN) models are widely in use for speech feature mapping from source to target speaker. In this paper, we propose an adaptive learning-based GAN model called ALGAN-VC for an efficient one-to-one VC of speakers. Our ALGAN-VC framework consists of some approaches to improve the speech quality and voice similarity between source and target speakers. The model incorporates a Dense Residual Network (DRN) like architecture to the generator network for efficient speech feature learning, for source to target speech feature conversion. We also integrate an adaptive learning mechanism to compute the loss function for the proposed model. Moreover, we use a boosted learning rate approach to enhance the learning capability of the proposed model. The model is trained by using both forward and inverse mapping simultaneously for a one-to-one VC. The proposed model is tested on Voice Conversion Challenge (VCC) 2016, 2018, and 2020 datasets as well as on our self-prepared speech dataset, which has been recorded in Indian regional languages and in English. A subjective and objective evaluation of the generated speech samples indicated that the proposed model elegantly performed the voice conversion task by achieving high speaker similarity and adequate speech quality.

MLFeb 5, 2021
Robust Principal Component Analysis: A Median of Means Approach

Debolina Paul, Saptarshi Chakraborty, Swagatam Das

Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.

LGDec 20, 2020
Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm

Saptarshi Chakraborty, Debolina Paul, Swagatam Das

Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.

MLNov 12, 2020
Kernel k-Means, By All Means: Algorithms and Strong Consistency

Debolina Paul, Saptarshi Chakraborty, Swagatam Das et al.

Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power $k$-Means, our algorithm makes use of majorization-minimization (MM) to better solve this non-convex problem. We show the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates, and we rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature non-linear and multi-view separation.

LGAug 26, 2020
Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

Sankha Subhra Mullick, Shounak Datta, Sourish Gunesh Dhekane et al.

Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. In light of these conditions, under the effect of class imbalance, we theoretically analyze four indices commonly used for evaluating binary classifiers and five popular indices for multi-class classifiers. For indices violating any of the conditions, we also suggest remedial modification and normalization. We further investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes. Simulation studies are performed on high dimensional deep representations of subset of the ImageNet dataset using four state-of-the-art classifiers tailored for handling class imbalance. Finally, based on our theoretical findings and empirical evidence, we recommend the appropriate indices that should be used to evaluate the performance of classifiers in presence of class-imbalance.

CRApr 24, 2020
A Black-box Adversarial Attack Strategy with Adjustable Sparsity and Generalizability for Deep Image Classifiers

Arka Ghosh, Sankha Subhra Mullick, Shounak Datta et al.

Constructing adversarial perturbations for deep neural networks is an important direction of research. Crafting image-dependent adversarial perturbations using white-box feedback has hitherto been the norm for such adversarial attacks. However, black-box attacks are much more practical for real-world applications. Universal perturbations applicable across multiple images are gaining popularity due to their innate generalizability. There have also been efforts to restrict the perturbations to a few pixels in the image. This helps to retain visual similarity with the original images making such attacks hard to detect. This paper marks an important step which combines all these directions of research. We propose the DEceit algorithm for constructing effective universal pixel-restricted perturbations using only black-box feedback from the target network. We conduct empirical investigations using the ImageNet validation set on the state-of-the-art deep neural classifiers by varying the number of pixels to be perturbed from a meagre 10 pixels to as high as all pixels in the image. We find that perturbing only about 10% of the pixels in an image using DEceit achieves a commendable and highly transferable Fooling Rate while retaining the visual quality. We further demonstrate that DEceit can be successfully applied to image dependent attacks as well. In both sets of experiments, we outperformed several state-of-the-art methods.

MLJan 10, 2020
Entropy Regularized Power k-Means Clustering

Saptarshi Chakraborty, Debolina Paul, Swagatam Das et al.

Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This paper addresses these issues by introducing \textit{entropy regularization} to learn feature relevance while annealing. We prove consistency of the proposed approach and derive a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. In particular, our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both. Its merits are thoroughly assessed on a suite of real and synthetic data experiments.