Sandip Roy

CR
h-index7
17papers
35citations
Novelty41%
AI Score35

17 Papers

GASep 4, 2024
How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Tri Nguyen, Francisco Villaescusa-Navarro, Siddharth Mishra-Sharma et al.

The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.

OCMar 28, 2018
On the Complexity and Approximability of Optimal Sensor Selection for Kalman Filtering

Lintao Ye, Sandip Roy, Shreyas Sundaram

Given a linear dynamical system, we consider the problem of selecting (at design-time) an optimal set of sensors (subject to certain budget constraints) to minimize the trace of the steady state error covariance matrix of the Kalman filter. Previous work has shown that this problem is NP-hard for certain classes of systems and sensor costs; in this paper, we show that the problem remains NP-hard even for the special case where the system is stable and all sensor costs are identical. Furthermore, we show the stronger result that there is no constant-factor (polynomial-time) approximation algorithm for this problem. This contrasts with other classes of sensor selection problems studied in the literature, which typically pursue constant-factor approximations by leveraging greedy algorithms and submodularity of the cost function. Here, we provide a specific example showing that greedy algorithms can perform arbitrarily poorly for the problem of design-time sensor selection for Kalman filtering.

SYMay 5, 2018
Modal Barriers to Controllability in Networks with Linearly-Coupled Homogeneous Subsystems

Mengran Xue, Sandip Roy

The controllability of networks comprising homogeneous multi-input multi-output linear subsystems with linear couplings among them is examined, from a modal perspective. The eigenvalues of the network model are classified into two groups: 1) network-invariant modes, which have very high multiplicity regardless of the network's topology; and 2) special-repeat modes, which are repeated for only particular network topologies and have bounded multiplicity. Characterizations of both types of modes are obtained, in part by drawing on decentralized-fixed-mode and generalized-eigenvalue concepts. We demonstrate that network-invariant modes necessarily prevent controllability unless a sufficient fraction of the subsystems are actuated, both in the network as a whole and in any weakly-connected partition. In contrast, the multiplicities of special-repeat modes have no influence on controllability. Our analysis highlights a distinction between built networks where subsystem interfaces may be unavoidable barriers to controllability, and multi-agent systems where protocols can be designed to ensure controllability.

SYMar 21, 2019
Controllability-Gramian Submatrices for a Network Consensus Model

Sandip Roy, Mengran Xue

Principal submatrices of the controllability Gramian and their inverses are examined, for a network-consensus model with inputs at a subset of network nodes. Specifically, several properties of the Gramian submatrices and their inverses -- including dominant eigenvalues and eigenvectors, diagonal entries, and sign patterns -- are characterized by exploiting the special doubly-nonnegative structure of the matrices. In addition, majorizations for these properties are obtained in terms of cutsets in the network's graph, based on the diffusive form of the model. The asymptotic (long time horizon) structure of the controllability Gramian is also analyzed. The results on the Gramian are used to study metrics for target control of the network-consensus model.

COMar 20, 2023
Seven open problems in applied combinatorics

Sinan G. Aksoy, Ryan Bennink, Yuzhou Chen et al.

We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems.

SYNov 6, 2018
Comments Regarding `On the Identifiability of the Influence Model for Stochastic Spatiotemporal Spread Processes'

Sandip Roy

The identifiability analysis of a networked Markov chain model known as the influence model, as described in a recent contribution to Arxiv, is examined. Two errors in the identifiability analysis -- one related to the unidentifiability of the partially-observed influence model, the second related to an omission of an additional recurrence criterion for identifiability -- are noted. In addition, some concerns about the formulation of the identifiability problem and the proposed estimation approach are noted.

SYMar 8, 2021
Sign Patterns of Inverse Doubly-Nonnegative Matrices

Sandip Roy, Mengran Xue

The sign patterns of inverse doubly-nonnegative matrices are examined. A necessary and sufficient condition is developed for a sign matrix to correspond to an inverse doubly-nonnegative matrix. In addition, for a doubly-nonnegative matrix whose graph is a tree, the inverse is shown to have a unique sign pattern, which can be expressed in terms of a two-coloring of the graph.

CRNov 28, 2023
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning

Soumya Banerjee, Sandip Roy, Sayyed Farid Ahamed et al.

The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model. MIA exploits the natural inclination of ML models to overfit upon the training data. MIAs are trained to distinguish between training and testing prediction confidence to infer membership information. Federated Learning (FL) is a privacy-preserving ML paradigm that enables multiple clients to train a unified model without disclosing their private data. In this paper, we propose an enhanced Membership Inference Attack with the Batch-wise generated Attack Dataset (MIA-BAD), a modification to the MIA approach. We investigate that the MIA is more accurate when the attack dataset is generated batch-wise. This quantitatively decreases the attack dataset while qualitatively improving it. We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches. Finally, we demonstrate the qualitative effects of the proposed MIA-BAD methodology by conducting extensive experiments with various target datasets, variable numbers of federated clients, and training batch sizes.

LGJul 26, 2024
Accuracy-Privacy Trade-off in the Mitigation of Membership Inference Attack in Federated Learning

Sayyed Farid Ahamed, Soumya Banerjee, Sandip Roy et al.

Over the last few years, federated learning (FL) has emerged as a prominent method in machine learning, emphasizing privacy preservation by allowing multiple clients to collaboratively build a model while keeping their training data private. Despite this focus on privacy, FL models are susceptible to various attacks, including membership inference attacks (MIAs), posing a serious threat to data confidentiality. In a recent study, Rezaei \textit{et al.} revealed the existence of an accuracy-privacy trade-off in deep ensembles and proposed a few fusion strategies to overcome it. In this paper, we aim to explore the relationship between deep ensembles and FL. Specifically, we investigate whether confidence-based metrics derived from deep ensembles apply to FL and whether there is a trade-off between accuracy and privacy in FL with respect to MIA. Empirical investigations illustrate a lack of a non-monotonic correlation between the number of clients and the accuracy-privacy trade-off. By experimenting with different numbers of federated clients, datasets, and confidence-metric-based fusion strategies, we identify and analytically justify the clear existence of the accuracy-privacy trade-off.

CRMar 31
Cooperative Local Differential Privacy: Securing Time Series Data in Distributed Environments

Bikash Chandra Singh, Md Jakir Hossain, Rafael Diaz et al.

The rapid growth of smart devices such as phones, wearables, IoT sensors, and connected vehicles has led to an explosion of continuous time series data that offers valuable insights in healthcare, transportation, and more. However, this surge raises significant privacy concerns, as sensitive patterns can reveal personal details. While traditional differential privacy (DP) relies on trusted servers, local differential privacy (LDP) enables users to perturb their own data. However, traditional LDP methods perturb time series data by adding user-specific noise but exhibit vulnerabilities. For instance, noise applied within fixed time windows can be canceled during aggregation (e.g., averaging), enabling adversaries to infer individual statistics over time, thereby eroding privacy guarantees. To address these issues, we introduce a Cooperative Local Differential Privacy (CLDP) mechanism that enhances privacy by distributing noise vectors across multiple users. In our approach, noise is collaboratively generated and assigned so that when all users' perturbed data is aggregated, the noise cancels out preserving overall statistical properties while protecting individual privacy. This cooperative strategy not only counters vulnerabilities inherent in time-window-based methods but also scales effectively for large, real-time datasets, striking a better balance between data utility and privacy in multiuser environments.

LGDec 6, 2024
Privacy Drift: Evolving Privacy Concerns in Incremental Learning

Sayyed Farid Ahamed, Soumya Banerjee, Sandip Roy et al.

In the evolving landscape of machine learning (ML), Federated Learning (FL) presents a paradigm shift towards decentralized model training while preserving user data privacy. This paper introduces the concept of ``privacy drift", an innovative framework that parallels the well-known phenomenon of concept drift. While concept drift addresses the variability in model accuracy over time due to changes in the data, privacy drift encapsulates the variation in the leakage of private information as models undergo incremental training. By defining and examining privacy drift, this study aims to unveil the nuanced relationship between the evolution of model performance and the integrity of data privacy. Through rigorous experimentation, we investigate the dynamics of privacy drift in FL systems, focusing on how model updates and data distribution shifts influence the susceptibility of models to privacy attacks, such as membership inference attacks (MIA). Our results highlight a complex interplay between model accuracy and privacy safeguards, revealing that enhancements in model performance can lead to increased privacy risks. We provide empirical evidence from experiments on customized datasets derived from CIFAR-100 (Canadian Institute for Advanced Research, 100 classes), showcasing the impact of data and concept drift on privacy. This work lays the groundwork for future research on privacy-aware machine learning, aiming to achieve a delicate balance between model accuracy and data privacy in decentralized environments.

CRMay 25, 2025
Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning

Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee et al.

Federated Learning (FL) is a collaborative learning framework designed to protect client data, yet it remains highly vulnerable to Intellectual Property (IP) threats. Model extraction (ME) attacks pose a significant risk to Machine Learning as a Service (MLaaS) platforms, enabling attackers to replicate confidential models by querying black-box (without internal insight) APIs. Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to such attacks. This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks. For various federated clients built under the NVFlare platform, we implemented ME attacks across two deep learning architectures and three image datasets. We evaluate the proposed ME attack performance using various metrics, including accuracy, fidelity, and KL divergence. The experiments show that for different FL clients, the accuracy and fidelity of the extracted model are closely related to the size of the attack query set. Additionally, we explore a transfer learning based approach where pretrained models serve as the starting point for the extraction process. The results indicate that the accuracy and fidelity of the fine-tuned pretrained extraction models are notably higher, particularly with smaller query sets, highlighting potential advantages for attackers.

CRMar 12, 2025
RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment

Md Morshed Alam, Lokesh Chandra Das, Sandip Roy et al.

Internet of Things (IoT) platforms with trigger-action capability allow event conditions to trigger actions in IoT devices autonomously by creating a chain of interactions. Adversaries exploit this chain of interactions to maliciously inject fake event conditions into IoT hubs, triggering unauthorized actions on target IoT devices to implement remote injection attacks. Existing defense mechanisms focus mainly on the verification of event transactions using physical event fingerprints to enforce the security policies to block unsafe event transactions. These approaches are designed to provide offline defense against injection attacks. The state-of-the-art online defense mechanisms offer real-time defense, but extensive reliability on the inference of attack impacts on the IoT network limits the generalization capability of these approaches. In this paper, we propose a platform-independent multi-agent online defense system, namely RESTRAIN, to counter remote injection attacks at runtime. RESTRAIN allows the defense agent to profile attack actions at runtime and leverages reinforcement learning to optimize a defense policy that complies with the security requirements of the IoT network. The experimental results show that the defense agent effectively takes real-time defense actions against complex and dynamic remote injection attacks and maximizes the security gain with minimal computational overhead.

CVJul 2, 2021
Compressive Representations of Weather Scenes for Strategic Air Traffic Flow Management

Sandip Roy

Terse representation of high-dimensional weather scene data is explored, in support of strategic air traffic flow management objectives. Specifically, we consider whether aviation-relevant weather scenes are compressible, in the sense that each scene admits a possibly-different sparse representation in a basis of interest. Here, compression of weather scenes extracted from METAR data (including temperature, flight categories, and visibility profiles for the contiguous United States) is examined, for the graph-spectral basis. The scenes are found to be compressible, with 75-95% of the scene content captured using 0.5-4% of the basis vectors. Further, the dominant basis vectors for each scene are seen to identify time-varying spatial characteristics of the weather, and reconstruction from the compressed representation is demonstrated. Finally, potential uses of the compressive representations in strategic TFM design are briefly scoped.

SYMar 29, 2019
Averager-copier-voter models for hybrid opinion dynamics in complex networks

Mengran Xue, Sandip Roy

A hybrid model for opinion dynamics in complex multi-agent networks is introduced, wherein some continuous-valued agents average neighbors' opinions to update their own, while other discrete-valued agents use stochastic copying and voting protocols. A statistical and graph-theoretic analysis of the model is undertaken, and consensus is shown to be achieved whenever the network matrix is ergodic. Also, the time required for consensus is characterized, in terms of the network's graph and the distribution of agents of different types.

SYOct 17, 2018
Cyber Threat Impact Analysis to Air Traffic Flows Through Dynamic Queue Networks

Ali Tamimi, Adam Hahn, Sandip Roy

Air traffic control increasingly depends on information and communication technology (ICT) to manage traffic flow through highly congested and increasingly interdependent airspace regions. While these systems are critical to ensuring the efficiency and safety of our airspace, they are also increasingly vulnerable to cyber threats that could potentially lead to reduction in capacity and/or reorganization of traffic flows. In this paper, we model various cyber threats to air traffic control systems, and analyze how these attacks could impact the flow of aircraft through the airspace. To perform this analysis, we consider a model for wide-area air traffic based on a dynamic queuing network model. Then we introduce three different attacks (Route Denial of Service, Route Selection Tampering, and Sector Denial of Service) to the air traffic control system, and explore how these attacks manipulate the sector flows by evaluating the queue backlogs for each sector's outflows. Furthermore, we then explore graph-level vulnerability metrics to identify the sectors that are most vulnerable to various flow manipulations, and compare them to case-study simulations of the various attacks. The results suggest that Route Denial of Service attacks have a significant impact on the target sector and lead to the largest degradation to the overall air traffic flows. Furthermore, the impact of Sector Denial of Service attack impacts are primarily confined to the target sector, while the Route Selection Tampering impacts are mostly confined to certain aircraft.

SYOct 4, 2018
Comment on `Detecting Topology Variations in Networks of Linear Dynamical Systems'

Sandip Roy, Mengran Xue

Conditions for the detectability of topology variations in dynamical networks are developed in a recent article in the IEEE Transactions on Control of Network Systems [1]. Here, an example is presented which illustrates an error in the network-theoretic conditions for detectability developed in [1].