Neeraj Kumar

LG
h-index13
43papers
1,388citations
Novelty40%
AI Score49

43 Papers

LGApr 22, 2022
Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Sarthak Pati, Ujjwal Baid, Brandon Edwards et al.

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

SPMar 16, 2022
EEG based Emotion Recognition: A Tutorial and Review

Xiang Li, Yazhou Zhang, Prayag Tiwari et al.

Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, those works are either fragmented in content or only focus on specific techniques adopted in this area but neglect the holistic perspective of the entire technical routes. Hence, in this paper, we review from the perspective of researchers who try to take the first step on this topic. We review the recent representative works in the EEG-based emotion recognition research and provide a tutorial to guide the researchers to start from the beginning. The scientific basis of EEG-based emotion recognition in the psychological and physiological levels is introduced. Further, we categorize these reviewed works into different technical routes and illustrate the theoretical basis and the research motivation, which will help the readers better understand why those techniques are studied and employed. At last, existing challenges and future investigations are also discussed in this paper, which guides the researchers to decide potential future research directions.

BMSep 18, 2024Code
Assessing Reusability of Deep Learning-Based Monotherapy Drug Response Prediction Models Trained with Omics Data

Jamie C. Overbeek, Alexander Partin, Thomas S. Brettin et al.

Cancer drug response prediction (DRP) models present a promising approach towards precision oncology, tailoring treatments to individual patient profiles. While deep learning (DL) methods have shown great potential in this area, models that can be successfully translated into clinical practice and shed light on the molecular mechanisms underlying treatment response will likely emerge from collaborative research efforts. This highlights the need for reusable and adaptable models that can be improved and tested by the wider scientific community. In this study, we present a scoring system for assessing the reusability of prediction DRP models, and apply it to 17 peer-reviewed DL-based DRP models. As part of the IMPROVE (Innovative Methodologies and New Data for Predictive Oncology Model Evaluation) project, which aims to develop methods for systematic evaluation and comparison DL models across scientific domains, we analyzed these 17 DRP models focusing on three key categories: software environment, code modularity, and data availability and preprocessing. While not the primary focus, we also attempted to reproduce key performance metrics to verify model behavior and adaptability. Our assessment of 17 DRP models reveals both strengths and shortcomings in model reusability. To promote rigorous practices and open-source sharing, we offer recommendations for developing and sharing prediction models. Following these recommendations can address many of the issues identified in this study, improving model reusability without adding significant burdens on researchers. This work offers the first comprehensive assessment of reusability and reproducibility across diverse DRP models, providing insights into current model sharing practices and promoting standards within the DRP and broader AI-enabled scientific research community.

LGJun 1, 2023
An Effective Meaningful Way to Evaluate Survival Models

Shi-ang Qi, Neeraj Kumar, Mahtab Farrokh et al.

One straightforward metric to evaluate a survival prediction model is based on the Mean Absolute Error (MAE) -- the average of the absolute difference between the time predicted by the model and the true event time, over all subjects. Unfortunately, this is challenging because, in practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event. In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals. Moreover, we introduce a novel and effective approach for generating realistic semi-synthetic survival datasets to facilitate the evaluation of metrics. Our findings, based on the analysis of the semi-synthetic datasets, reveal that our proposed metric (MAE using pseudo-observations) is able to rank models accurately based on their performance, and often closely matches the true MAE -- in particular, is better than several alternative methods.

CVSep 21, 2022
An Overview of Violence Detection Techniques: Current Challenges and Future Directions

Nadia Mumtaz, Naveed Ejaz, Shabana Habib et al.

The Big Video Data generated in today's smart cities has raised concerns from its purposeful usage perspective, where surveillance cameras, among many others are the most prominent resources to contribute to the huge volumes of data, making its automated analysis a difficult task in terms of computation and preciseness. Violence Detection (VD), broadly plunging under Action and Activity recognition domain, is used to analyze Big Video data for anomalous actions incurred due to humans. The VD literature is traditionally based on manually engineered features, though advancements to deep learning based standalone models are developed for real-time VD analysis. This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence. This overview also dives into the initial image processing and machine learning-based VD literature and their possible advantages such as efficiency against the current complex models. Furthermore,the datasets are discussed, to provide an analysis of the current models, explaining their pros and cons with future directions in VD domain derived from an in-depth analysis of the previous methods.

63.8AIMay 27
Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Jyotirmoy Nath, Neeraj Kumar, Brejesh Lall

Automatic prompt optimization (APO) has driven significant gains in LLM-based agentic workflows. However, existing methods treat each task's prompt as a monolithic, instance-blind string optimized through global edits, producing brittle updates and preventing the reuse of learned sub-behaviors. We propose Prompt Codebooks (PCO), a novel compositional prompt optimization framework that recasts APO as discrete learning over a finite vocabulary of natural-language instincts - atomic, reusable instruction units. PCO organizes prompt-construction knowledge in a discrete codebook and routes each input to a small subset of entries via an LLM-based encoder; a generator composes them into a prompt for the frozen target model; a critic emits a structured verdict that decomposes by attribution into per-variable textual gradients, jointly training the encoder, generator, and codebook under a language-valued min-max objective. The resulting routing is per-instance: different inputs in the same task receive different instinct compositions, a regime structurally inexpressible under instance-blind methods. Across six benchmarks on Qwen3-8B and LLaMA-3.1-8B, PCO improves over zero-shot by up to +30.36 points, surpasses the strongest prior baseline (GEPA) by +3.34 on HotpotQA and +1.11 in aggregate, and reduces deployed prompt length by up to 14.1x versus MIPROv2 and 3.0x versus GEPA using only K=16 instincts.

LGMay 21, 2022
De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning

Andrew D. McNaughton, Mridula S. Bontha, Carter R. Knutson et al.

Efficient design and discovery of target-driven molecules is a critical step in facilitating lead optimization in drug discovery. Current approaches to develop molecules for a target protein are intuition-driven, hampered by slow iterative design-test cycles due to computational challenges in utilizing 3D structural data, and ultimately limited by the expertise of the chemist - leading to bottlenecks in molecular design. In this contribution, we propose a novel framework, called 3D-MolGNN$_{RL}$, coupling reinforcement learning (RL) to a deep generative model based on 3D-Scaffold to generate target candidates specific to a protein building up atom by atom from the starting core scaffold. 3D-MolGNN$_{RL}$ provides an efficient way to optimize key features by multi-objective reward function within a protein pocket using parallel graph neural network models. The agent learns to build molecules in 3D space while optimizing the activity, binding affinity, potency, and synthetic accessibility of the candidates generated for infectious disease protein targets. Our approach can serve as an interpretable artificial intelligence (AI) tool for lead optimization with optimized activity, potency, and biophysical properties.

CLMay 2, 2024Code
CACTUS: Chemistry Agent Connecting Tool-Usage to Science

Andrew D. McNaughton, Gautham Ramalaxmi, Agustin Kruel et al.

Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.

CLDec 21, 2022
KL Regularized Normalization Framework for Low Resource Tasks

Neeraj Kumar, Ankur Narang, Brejesh Lall

Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.

SDOct 27, 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN

Neeraj Kumar, Ankur Narang, Brejesh Lall

In this paper, we present a Diffusion GAN based approach (Prosodic Diff-TTS) to generate the corresponding high-fidelity speech based on the style description and content text as an input to generate speech samples within only 4 denoising steps. It leverages the novel conditional prosodic layer normalization to incorporate the style embeddings into the multi head attention based phoneme encoder and mel spectrogram decoder based generator architecture to generate the speech. The style embedding is generated by fine tuning the pretrained BERT model on auxiliary tasks such as pitch, speaking speed, emotion,gender classifications. We demonstrate the efficacy of our proposed architecture on multi-speaker LibriTTS and PromptSpeech datasets, using multiple quantitative metrics that measure generated accuracy and MOS.

CVAug 31, 2023
Distraction-free Embeddings for Robust VQA

Atharvan Dogra, Deeksha Varshney, Ashwin Kalyan et al.

The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling a sparse set of frames or text tokens), or adding external knowledge. We present a novel "DRAX: Distraction Removal and Attended Cross-Alignment" method to rid our cross-modal representations of distractors in the latent space. We do not exclusively confine the perception of any input information from various modalities but instead use an attention-guided distraction removal method to increase focus on task-relevant information in latent embeddings. DRAX also ensures semantic alignment of embeddings during cross-modal fusions. We evaluate our approach on a challenging benchmark (SUTD-TrafficQA dataset), testing the framework's abilities for feature and event queries, temporal relation understanding, forecasting, hypothesis, and causal analysis through extensive experiments.

LGDec 20, 2022
Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction

Carter Knutson, Gihan Panapitiya, Rohith Varikoti et al.

Neural Networks (GNNs) have revolutionized the molecular discovery to understand patterns and identify unknown features that can aid in predicting biophysical properties and protein-ligand interactions. However, current models typically rely on 2-dimensional molecular representations as input, and while utilization of 2\3- dimensional structural data has gained deserved traction in recent years as many of these models are still limited to static graph representations. We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions. Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions as a downstream task. Through extensive testing we compare our results with the existing models, our MDA-PLI model was able to outperform the molecular interaction prediction models with an RMSE of 1.2958. The geometric encodings enabled by our transformer architecture and the addition of time series data add a new dimensionality to this form of research.

BMDec 15, 2022
Scaffold-Based Multi-Objective Drug Candidate Optimization

Agustin Kruel, Andrew D. McNaughton, Neeraj Kumar

In therapeutic design, balancing various physiochemical properties is crucial for molecule development, similar to how Multiparameter Optimization (MPO) evaluates multiple variables to meet a primary goal. While many molecular features can now be predicted using \textit{in silico} methods, aiding early drug development, the vast data generated from high throughput virtual screening challenges the practicality of traditional MPO approaches. Addressing this, we introduce a scaffold focused graph-based Markov chain Monte Carlo framework (ScaMARS) built to generate molecules with optimal properties. This innovative framework is capable of self-training and handling a wider array of properties, sampling different chemical spaces according to the starting scaffold. The benchmark analysis on several properties shows that ScaMARS has a diversity score of 84.6\% and has a much higher success rate of 99.5\% compared to conditional models. The integration of new features into MPO significantly enhances its adaptability and effectiveness in therapeutic design, facilitating the discovery of candidates that efficiently optimize multiple properties.

20.1CVMar 21
GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit

Chad Vanderbilt, Gabriele Campanella, Siddharth Singi et al.

Computational biomarkers (CBs) are histopathology-derived patterns extracted from hematoxylin-eosin (H&E) whole-slide images (WSIs) using artificial intelligence (AI) to predict therapeutic response or prognosis. Recently, slide-level multiple-instance learning (MIL) with pathology foundation models (PFMs) has become the standard baseline for CB development. While these methods have improved predictive performance, computational pathology lacks standardized intermediate data formats, provenance tracking, checkpointing conventions, and reproducible evaluation metrics required for clinical-grade deployment. We introduce GOLDMARK (https://artificialintelligencepathology.org), a standardized benchmarking framework built on a curated TCGA cohort with clinically actionable OncoKB level 1-3 biomarker labels. GOLDMARK releases structured intermediate representations, including tile coordinate maps, per-slide feature embeddings from canonical PFMs, quality-control metadata, predefined patient-level splits, trained slide-level models, and evaluation outputs. Models are trained on TCGA and evaluated on an independent MSKCC cohort with reciprocal testing. Across 33 tumor-biomarker tasks, mean AUROC was 0.689 (TCGA) and 0.630 (MSKCC). Restricting to the eight highest-performing tasks yielded mean AUROCs of 0.831 and 0.801, respectively. These tasks correspond to established morphologic-genomic associations (e.g., LGG IDH1, COAD MSI/BRAF, THCA BRAF/NRAS, BLCA FGFR3, UCEC PTEN) and showed the most stable cross-site performance. Differences between canonical encoders were modest relative to task-specific variability. GOLDMARK establishes a shared experimental substrate for computational pathology, enabling reproducible benchmarking and direct comparison of methods across datasets and models.

BMMay 7, 2021Code
Evening the Score: Targeting SARS-CoV-2 Protease Inhibition in Graph Generative Models for Therapeutic Candidates

Jenna Bilbrey, Logan Ward, Sutanay Choudhury et al.

We examine a pair of graph generative models for the therapeutic design of novel drug candidates targeting SARS-CoV-2 viral proteins. Due to a sense of urgency, we chose well-validated models with unique strengths: an autoencoder that generates molecules with similar structures to a dataset of drugs with anti-SARS activity and a reinforcement learning algorithm that generates highly novel molecules. During generation, we explore optimization toward several design targets to balance druglikeness, synthetic accessability, and anti-SARS activity based on \icfifty. This generative framework\footnote{https://github.com/exalearn/covid-drug-design} will accelerate drug discovery in future pandemics through the high-throughput generation of targeted therapeutic candidates.

LGNov 15, 2024
Electrical Load Forecasting in Smart Grid: A Personalized Federated Learning Approach

Ratun Rahman, Neeraj Kumar, Dinh C. Nguyen

Electric load forecasting is essential for power management and stability in smart grids. This is mainly achieved via advanced metering infrastructure, where smart meters (SMs) are used to record household energy consumption. Traditional machine learning (ML) methods are often employed for load forecasting but require data sharing which raises data privacy concerns. Federated learning (FL) can address this issue by running distributed ML models at local SMs without data exchange. However, current FL-based approaches struggle to achieve efficient load forecasting due to imbalanced data distribution across heterogeneous SMs. This paper presents a novel personalized federated learning (PFL) method to load prediction under non-independent and identically distributed (non-IID) metering data settings. Specifically, we introduce meta-learning, where the learning rates are manipulated using the meta-learning idea to maximize the gradient for each client in each global round. Clients with varying processing capacities, data sizes, and batch sizes can participate in global model aggregation and improve their local load forecasting via personalized learning. Simulation results show that our approach outperforms state-of-the-art ML and FL methods in terms of better load forecasting accuracy.

LGDec 10, 2024
Comparative Analysis of Deep Learning Approaches for Harmful Brain Activity Detection Using EEG

Shivraj Singh Bhatti, Aryan Yadav, Mitali Monga et al.

The classification of harmful brain activities, such as seizures and periodic discharges, play a vital role in neurocritical care, enabling timely diagnosis and intervention. Electroencephalography (EEG) provides a non-invasive method for monitoring brain activity, but the manual interpretation of EEG signals are time-consuming and rely heavily on expert judgment. This study presents a comparative analysis of deep learning architectures, including Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and EEGNet, applied to the classification of harmful brain activities using both raw EEG data and time-frequency representations generated through Continuous Wavelet Transform (CWT). We evaluate the performance of these models use multimodal data representations, including high-resolution spectrograms and waveform data, and introduce a multi-stage training strategy to improve model robustness. Our results show that training strategies, data preprocessing, and augmentation techniques are as critical to model success as architecture choice, with multi-stage TinyViT and EfficientNet demonstrating superior performance. The findings underscore the importance of robust training regimes in achieving accurate and efficient EEG classification, providing valuable insights for deploying AI models in clinical practice.

CVJun 5, 2025
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

Neeraj Kumar, Swaraj Nanda, Siddharth Singi et al.

Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs). However, adapting these pretrained PFMs for specific clinical tasks presents considerable challenges, primarily due to the availability of only weak (WSI-level) labels for gigapixel images, necessitating multiple instance learning (MIL) paradigm for effective WSI analysis. This paper proposes a novel approach for single-GPU \textbf{T}ask \textbf{A}daptation of \textbf{PFM}s (TAPFM) that uses vision transformer (\vit) attention for MIL aggregation while optimizing both for feature representations and attention weights. The proposed approach maintains separate computational graphs for MIL aggregator and the PFM to create stable training dynamics that align with downstream task objectives during end-to-end adaptation. Evaluated on mutation prediction tasks for bladder cancer and lung adenocarcinoma across institutional and TCGA cohorts, TAPFM consistently outperforms conventional approaches, with H-Optimus-0 (TAPFM) outperforming the benchmarks. TAPFM effectively handles multi-label classification of actionable mutations as well. Thus, TAPFM makes adaptation of powerful pre-trained PFMs practical on standard hardware for various clinical applications.

IVMay 29, 2023
The ACROBAT 2022 Challenge: Automatic Registration Of Breast Cancer Tissue

Philippe Weitz, Masi Valkonen, Leslie Solorzano et al.

The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results establish the current state-of-the-art in WSI registration and guide researchers in selecting and developing methods.

CRFeb 7, 2022
A Reliable Data-transmission Mechanism using Blockchain in Edge Computing Scenarios

Peiying Zhang, Xue Pang, Neeraj Kumar et al.

With the advent of the Internet of things (IoT) era, more and more devices are connected to the IoT. Under the traditional cloud-thing centralized management mode, the transmission of massive data is facing many difficulties, and the reliability of data is difficult to be guaranteed. As emerging technologies, blockchain technology and edge computing (EC) technology have attracted the attention of academia in improving the reliability, privacy and invariability of IoT technology. In this paper, we combine the characteristics of the EC and blockchain to ensure the reliability of data transmission in the IoT. First of all, we propose a data transmission mechanism based on blockchain, which uses the distributed architecture of blockchain to ensure that the data is not tampered with; secondly, we introduce the three-tier structure in the architecture in turn; finally, we introduce the four working steps of the mechanism, which are similar to the working mechanism of blockchain. In the end, the simulation results show that the proposed scheme can ensure the reliability of data transmission in the Internet of things to a great extent.

NIFeb 3, 2022
IoV Scenario: Implementation of a Bandwidth Aware Algorithm in Wireless Network Communication Mode

Peiying Zhang, Chao Wang, Gagangeet Singh Aujla et al.

The wireless network communication mode represented by the Internet of vehicles (IoV) has been widely used. However, due to the limitations of traditional network architecture, resource scheduling in wireless network environment is still facing great challenges. This paper focuses on the allocation of bandwidth resources in the virtual network environment. This paper proposes a bandwidth aware multi domain virtual network embedding algorithm (BA-VNE). The algorithm is mainly aimed at the problem that users need a lot of bandwidth in wireless communication mode, and solves the problem of bandwidth resource allocation from the perspective of virtual network embedding (VNE). In order to improve the performance of the algorithm, we introduce particle swarm optimization (PSO) algorithm to optimize the performance of the algorithm. In order to verify the effectiveness of the algorithm, we have carried out simulation experiments from link bandwidth, mapping cost and virtual network request (VNR) acceptance rate. The final results show that the proposed algorithm is better than other representative algorithms in the above indicators.

NIFeb 3, 2022
Space-Air-Ground Integrated Multi-domain Network Resource Orchestration based on Virtual Network Architecture: a DRL Method

Peiying Zhang, Chao Wang, Neeraj Kumar et al.

Traditional ground wireless communication networks cannot provide high-quality services for artificial intelligence (AI) applications such as intelligent transportation systems (ITS) due to deployment, coverage and capacity issues. The space-air-ground integrated network (SAGIN) has become a research focus in the industry. Compared with traditional wireless communication networks, SAGIN is more flexible and reliable, and it has wider coverage and higher quality of seamless connection. However, due to its inherent heterogeneity, time-varying and self-organizing characteristics, the deployment and use of SAGIN still faces huge challenges, among which the orchestration of heterogeneous resources is a key issue. Based on virtual network architecture and deep reinforcement learning (DRL), we model SAGIN's heterogeneous resource orchestration as a multi-domain virtual network embedding (VNE) problem, and propose a SAGIN cross-domain VNE algorithm. We model the different network segments of SAGIN, and set the network attributes according to the actual situation of SAGIN and user needs. In DRL, the agent is acted by a five-layer policy network. We build a feature matrix based on network attributes extracted from SAGIN and use it as the agent training environment. Through training, the probability of each underlying node being embedded can be derived. In test phase, we complete the embedding process of virtual nodes and links in turn based on this probability. Finally, we verify the effectiveness of the algorithm from both training and testing.

NIFeb 3, 2022
Dynamic Virtual Network Embedding Algorithm based on Graph Convolution Neural Network and Reinforcement Learning

Peiying Zhang, Chao Wang, Neeraj Kumar et al.

Network virtualization (NV) is a technology with broad application prospects. Virtual network embedding (VNE) is the core orientation of VN, which aims to provide more flexible underlying physical resource allocation for user function requests. The classical VNE problem is usually solved by heuristic method, but this method often limits the flexibility of the algorithm and ignores the time limit. In addition, the partition autonomy of physical domain and the dynamic characteristics of virtual network request (VNR) also increase the difficulty of VNE. This paper proposed a new type of VNE algorithm, which applied reinforcement learning (RL) and graph neural network (GNN) theory to the algorithm, especially the combination of graph convolutional neural network (GCNN) and RL algorithm. Based on a self-defined fitness matrix and fitness value, we set up the objective function of the algorithm implementation, realized an efficient dynamic VNE algorithm, and effectively reduced the degree of resource fragmentation. Finally, we used comparison algorithms to evaluate the proposed method. Simulation experiments verified that the dynamic VNE algorithm based on RL and GCNN has good basic VNE characteristics. By changing the resource attributes of physical network and virtual network, it can be proved that the algorithm has good flexibility.

CRFeb 3, 2022
Resource Management and Security Scheme of ICPSs and IoT Based on VNE Algorithm

Peiying Zhang, Chao Wang, Chunxiao Jiang et al.

The development of Intelligent Cyber-Physical Systems (ICPSs) in virtual network environment is facing severe challenges. On the one hand, the Internet of things (IoT) based on ICPSs construction needs a large amount of reasonable network resources support. On the other hand, ICPSs are facing severe network security problems. The integration of ICPSs and network virtualization (NV) can provide more efficient network resource support and security guarantees for IoT users. Based on the above two problems faced by ICPSs, we propose a virtual network embedded (VNE) algorithm with computing, storage resources and security constraints to ensure the rationality and security of resource allocation in ICPSs. In particular, we use reinforcement learning (RL) method as a means to improve algorithm performance. We extract the important attribute characteristics of underlying network as the training environment of RL agent. Agent can derive the optimal node embedding strategy through training, so as to meet the requirements of ICPSs for resource management and security. The embedding of virtual links is based on the breadth first search (BFS) strategy. Therefore, this is a comprehensive two-stage RL-VNE algorithm considering the constraints of computing, storage and security three-dimensional resources. Finally, we design a large number of simulation experiments from the perspective of typical indicators of VNE algorithms. The experimental results effectively illustrate the effectiveness of the algorithm in the application of ICPSs.

MLNov 30, 2021
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

Carter Knutson, Mridula Bontha, Jenna A. Bilbrey et al.

Protein-ligand interactions (PLIs) are fundamental to biochemical research and their identification is crucial for estimating biophysical and biochemical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures, GNNF is the base implementation that employs distinct featurization to enhance domain-awareness, while GNNP is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and proteins 3D structure with 0.979 test accuracy for GNNF and 0.958 for GNNP for predicting activity of a protein-ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and pIC50 is crucial for drugs potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on pIC50 with GNNF and GNNP, respectively, outperforming similar 2D sequence-based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of GNNP on SARS-Cov-2 protein targets by screening a large compound library and comparing our prediction with the experimentally measured data.

LGNov 27, 2021
Learning from learning machines: a new generation of AI technology to meet the needs of science

Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin et al.

We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself.

CVAug 20, 2021
Few Shot Activity Recognition Using Variational Inference

Neeraj Kumar, Siddhansh Narang

There has been a remarkable progress in learning a model which could recognise novel classes with only a few labeled examples in the last few years. Few-shot learning (FSL) for action recognition is a challenging task of recognising novel action categories which are represented by few instances in the training data. We propose a novel variational inference based architectural framework (HF-AR) for few shot activity recognition. Our framework leverages volume-preserving Householder Flow to learn a flexible posterior distribution of the novel classes. This results in better performance as compared to state-of-the-art few shot approaches for human activity recognition. approach consists of base model and an adapter model. Our architecture consists of a base model and an adapter model. The base model is trained on seen classes and it computes an embedding that represent the spatial and temporal insights extracted from the input video, e.g. combination of Resnet-152 and LSTM based encoder-decoder model. The adapter model applies a series of Householder transformations to compute a flexible posterior distribution that lends higher accuracy in the few shot approach. Extensive experiments on three well-known datasets: UCF101, HMDB51 and Something-Something-V2, demonstrate similar or better performance on 1-shot and 5-shot classification as compared to state-of-the-art few shot approaches that use only RGB frame sequence as input. To the best of our knowledge, we are the first to explore variational inference along with householder transformations to capture the full rank covariance matrix of posterior distribution, for few shot learning in activity recognition.

LGJun 4, 2021
Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery

Yulun Wu, Mikaela Cashman, Nicholas Choma et al.

We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.

CRMay 25, 2021
Security in Next Generation Mobile Payment Systems: A Comprehensive Survey

Waqas Ahmed, Amir Rasool, Neeraj Kumar et al.

Cash payment is still king in several markets, accounting for more than 90\ of the payments in almost all the developing countries. The usage of mobile phones is pretty ordinary in this present era. Mobile phones have become an inseparable friend for many users, serving much more than just communication tools. Every subsequent person is heavily relying on them due to multifaceted usage and affordability. Every person wants to manage his/her daily transactions and related issues by using his/her mobile phone. With the rise and advancements of mobile-specific security, threats are evolving as well. In this paper, we provide a survey of various security models for mobile phones. We explore multiple proposed models of the mobile payment system (MPS), their technologies and comparisons, payment methods, different security mechanisms involved in MPS, and provide analysis of the encryption technologies, authentication methods, and firewall in MPS. We also present current challenges and future directions of mobile phone security.

IVApr 26, 2021
FedDPGAN: Federated Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-19 Pneumonia

Longling Zhang, Bochen Shen, Ahmed Barnawi et al.

Existing deep learning technologies generally learn the features of chest X-ray data generated by Generative Adversarial Networks (GAN) to diagnose COVID-19 pneumonia. However, the above methods have a critical challenge: data privacy. GAN will leak the semantic information of the training data which can be used to reconstruct the training samples by attackers, thereby this method will leak the privacy of the patient. Furthermore, for this reason that is the limitation of the training data sample, different hospitals jointly train the model through data sharing, which will also cause the privacy leakage. To solve this problem, we adopt the Federated Learning (FL) frame-work which is a new technique being used to protect the data privacy. Under the FL framework and Differentially Private thinking, we propose a FederatedDifferentially Private Generative Adversarial Network (FedDPGAN) to detectCOVID-19 pneumonia for sustainable smart cities. Specifically, we use DP-GAN to privately generate diverse patient data in which differential privacy technology is introduced to make sure the privacy protection of the semantic information of training dataset. Furthermore, we leverage FL to allow hospitals to collaboratively train COVID-19 models without sharing the original data. Under Independent and Identically Distributed (IID) and non-IID settings, The evaluation of the proposed model is on three types of chest X-ray (CXR) images dataset (COVID-19, normal, and normal pneumonia). A large number of the truthful reports make the verification of our model can effectively diagnose COVID-19 without compromising privacy.

CVFeb 19, 2021
One Shot Audio to Animated Video Generation

Neeraj Kumar, Srishti Goel, Ankur Narang et al.

We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for unsupervised generation of animated video. In our approach, the input audio clip is not restricted to any specific language, which gives the method multilingual applicability. OneShotAu2AV can generate animated videos that have: (a) lip movements that are in sync with the audio, (b) natural facial expressions such as blinks and eyebrow movements, (c) head movements. Experimental evaluation demonstrates superior performance of OneShotAu2AV as compared to U-GAT-IT and RecycleGan on multiple quantitative metrics including KID(Kernel Inception Distance), Word error rate, blinks/sec

LGFeb 10, 2021
Artificial Intelligence based Autonomous Molecular Design for Medical Therapeutic: A Perspective

Rajendra P. Joshi, Neeraj Kumar

Domain-aware machine learning (ML) models have been increasingly adopted for accelerating small molecule therapeutic design in the recent years. These models have been enabled by significant advancement in state-of-the-art artificial intelligence (AI) and computing infrastructures. Several ML architectures are pre-dominantly and independently used either for predicting the properties of small molecules, or for generating lead therapeutic candidates. Synergetically using these individual components along with robust representation and data generation techniques autonomously in closed loops holds enormous promise for accelerated drug design which is a time consuming and expensive task otherwise. In this perspective, we present the most recent breakthrough achieved by each of the components, and how such autonomous AI and ML workflow can be realized to radically accelerate the hit identification and lead optimization. Taken together, this could significantly shorten the timeline for end-to-end antiviral discovery and optimization times to weeks upon the arrival of a novel zoonotic transmission event. Our perspective serves as a guide for researchers to practice autonomous molecular design in therapeutic discovery.

LGFeb 9, 2021
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19

Logan Ward, Jenna A. Bilbrey, Sutanay Choudhury et al.

Design of new drug compounds with target properties is a key area of research in generative modeling. We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates: 1) a variational autoencoder-based approach (VAE) that uses prior knowledge of molecules that have been shown to be effective for earlier coronavirus treatments and 2) a deep Q-learning method (DQN) that generates optimized molecules without any proximity constraints. We evaluate the novelty of the automated molecule generation approaches by validating the candidate molecules with drug-protein binding affinity models. The VAE method produced two novel molecules with similar structures to the antiretroviral protease inhibitor Indinavir that show potential binding affinity for the SARS-CoV-2 protein target 3-chymotrypsin-like protease (3CL-protease).

CVDec 14, 2020
Robust One Shot Audio to Video Generation

Neeraj Kumar, Srishti Goel, Ankur Narang et al.

Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a challenging problem that involves complex learning steps for generative adversarial networks. Further, enabling one-shot learning for an unseen single image increases the complexity of the problem while simultaneously making it more applicable to practical scenarios. In the paper, we propose a novel approach OneShotA2V to synthesize a talking person video of arbitrary length using as input: an audio signal and a single unseen image of a person. OneShotA2V leverages curriculum learning to learn movements of expressive facial components and hence generates a high-quality talking-head video of the given person. Further, it feeds the features generated from the audio input directly into a generative adversarial network and it adapts to any given unseen selfie by applying fewshot learning with only a few output updation epochs. OneShotA2V leverages spatially adaptive normalization based multi-level generator and multiple multi-level discriminators based architecture. The input audio clip is not restricted to any specific language, which gives the method multilingual applicability. Experimental evaluation demonstrates superior performance of OneShotA2V as compared to Realistic Speech-Driven Facial Animation with GANs(RSDGAN) [43], Speech2Vid [8], and other approaches, on multiple quantitative metrics including: SSIM (structural similarity index), PSNR (peak signal to noise ratio) and CPBD (image sharpness). Further, qualitative evaluation and Online Turing tests demonstrate the efficacy of our approach.

CVDec 14, 2020
Multi Modal Adaptive Normalization for Audio to Video Generation

Neeraj Kumar, Srishti Goel, Ankur Narang et al.

Speech-driven facial video generation has been a complex problem due to its multi-modal aspects namely audio and video domain. The audio comprises lots of underlying features such as expression, pitch, loudness, prosody(speaking style) and facial video has lots of variability in terms of head movement, eye blinks, lip synchronization and movements of various facial action units along with temporal smoothness. Synthesizing highly expressive facial videos from the audio input and static image is still a challenging task for generative adversarial networks. In this paper, we propose a multi-modal adaptive normalization(MAN) based architecture to synthesize a talking person video of arbitrary length using as input: an audio signal and a single image of a person. The architecture uses the multi-modal adaptive normalization, keypoint heatmap predictor, optical flow predictor and class activation map[58] based layers to learn movements of expressive facial components and hence generates a highly expressive talking-head video of the given person. The multi-modal adaptive normalization uses the various features of audio and video such as Mel spectrogram, pitch, energy from audio signals and predicted keypoint heatmap/optical flow and a single image to learn the respective affine parameters to generate highly expressive video. Experimental evaluation demonstrates superior performance of the proposed method as compared to Realistic Speech-Driven Facial Animation with GANs(RSDGAN) [53], Speech2Vid [10], and other approaches, on multiple quantitative metrics including: SSIM (structural similarity index), PSNR (peak signal to noise ratio), CPBD (image sharpness), WER(word error rate), blinks/sec and LMD(landmark distance). Further, qualitative evaluation and Online Turing tests demonstrate the efficacy of our approach.

ASDec 14, 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis

Neeraj Kumar, Srishti Goel, Ankur Narang et al.

The style of the speech varies from person to person and every person exhibits his or her own style of speaking that is determined by the language, geography, culture and other factors. Style is best captured by prosody of a signal. High quality multi-speaker speech synthesis while considering prosody and in a few shot manner is an area of active research with many real-world applications. While multiple efforts have been made in this direction, it remains an interesting and challenging problem. In this paper, we present a novel few shot multi-speaker speech synthesis approach (FSM-SS) that leverages adaptive normalization architecture with a non-autoregressive multi-head attention model. Given an input text and a reference speech sample of an unseen person, FSM-SS can generate speech in that person's style in a few shot manner. Additionally, we demonstrate how the affine parameters of normalization help in capturing the prosodic features such as energy and fundamental frequency in a disentangled fashion and can be used to generate morphed speech output. We demonstrate the efficacy of our proposed architecture on multi-speaker VCTK and LibriTTS datasets, using multiple quantitative metrics that measure generated speech distortion and MoS, along with speaker embedding analysis of the generated speech vs the actual speech samples.

CVDec 8, 2020
Mitigating the Impact of Adversarial Attacks in Very Deep Networks

Mohammed Hassanin, Ibrahim Radwan, Nour Moustafa et al.

Deep Neural Network (DNN) models have vulnerabilities related to security concerns, with attackers usually employing complex hacking techniques to expose their structures. Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models. They negatively impact the learning process, with no benefit to deeper networks, as they degrade a model's accuracy and convergence rates. In this paper, we propose an attack-agnostic-based defense method for mitigating their influence. In it, a Defensive Feature Layer (DFL) is integrated with a well-known DNN architecture which assists in neutralizing the effects of illegitimate perturbation samples in the feature space. To boost the robustness and trustworthiness of this method for correctly classifying attacked input samples, we regularize the hidden space of a trained model with a discriminative loss function called Polarized Contrastive Loss (PCL). It improves discrimination among samples in different classes and maintains the resemblance of those in the same class. Also, we integrate a DFL and PCL in a compact model for defending against data poisoning attacks. This method is trained and tested using the CIFAR-10 and MNIST datasets with data poisoning-enabled perturbation attacks, with the experimental results revealing its excellent performance compared with those of recent peer techniques.

CRNov 3, 2020
Blockchain based Attack Detection on Machine Learning Algorithms for IoT based E-Health Applications

Thippa Reddy Gadekallu, Manoj M K, Sivarama Krishnan S et al.

The application of machine learning (ML) algorithms are massively scaling-up due to rapid digitization and emergence of new tecnologies like Internet of Things (IoT). In today's digital era, we can find ML algorithms being applied in the areas of healthcare, IoT, engineering, finance and so on. However, all these algorithms need to be trained in order to predict/solve a particular problem. There is high possibility of tampering the training datasets and produce biased results. Hence, in this article, we have proposed blockchain based solution to secure the datasets generated from IoT devices for E-Health applications. The proposed blockchain based solution uses using private cloud to tackle the aforementioned issue. For evaluation, we have developed a system that can be used by dataset owners to secure their data.

LGApr 14, 2020
Systematically designing better instance counting models on cell images with Neural Arithmetic Logic Units

Ashish Rana, Taranveer Singh, Harpreet Singh et al.

The big problem for neural network models which are trained to count instances is that whenever test range goes high training range generalization error increases i.e. they are not good generalizers outside training range. Consider the case of automating cell counting process where more dense images with higher cell counts are commonly encountered as compared to images used in training data. By making better predictions for higher ranges of cell count we are aiming to create better generalization systems for cell counting. With architecture proposal of neural arithmetic logic units (NALU) for arithmetic operations, task of counting has become feasible for higher numeric ranges which were not included in training data with better accuracy. As a part of our study we used these units and different other activation functions for learning cell counting task with two different architectures namely Fully Convolutional Regression Network and U-Net. These numerically biased units are added in the form of residual concatenated layers to original architectures and a comparative experimental study is done with these newly proposed changes. This comparative study is described in terms of optimizing regression loss problem from these models trained with extensive data augmentation techniques. We were able to achieve better results in our experiments of cell counting tasks with introduction of these numerically biased units to already existing architectures in the form of residual layer concatenation connections. Our results confirm that above stated numerically biased units does help models to learn numeric quantities for better generalization results.

LGMar 19, 2020
Uncertainty Estimation in Cancer Survival Prediction

Hrushikesh Loya, Pranav Poduval, Deepak Anand et al.

Survival models are used in various fields, such as the development of cancer treatment protocols. Although many statistical and machine learning models have been proposed to achieve accurate survival predictions, little attention has been paid to obtain well-calibrated uncertainty estimates associated with each prediction. The currently popular models are opaque and untrustworthy in that they often express high confidence even on those test cases that are not similar to the training samples, and even when their predictions are wrong. We propose a Bayesian framework for survival models that not only gives more accurate survival predictions but also quantifies the survival uncertainty better. Our approach is a novel combination of variational inference for uncertainty estimation, neural multi-task logistic regression for estimating nonlinear and time-varying risk models, and an additional sparsity-inducing prior to work with high dimensional data.

CRSep 17, 2018
FeatureAnalytics: An approach to derive relevant attributes for analyzing Android Malware

Deepa K, Radhamani G, Vinod P et al.

Ever increasing number of Android malware, has always been a concern for cybersecurity professionals. Even though plenty of anti-malware solutions exist, a rational and pragmatic approach for the same is rare and has to be inspected further. In this paper, we propose a novel two-set feature selection approach based on Rough Set and Statistical Test named as RSST to extract relevant system calls. To address the problem of higher dimensional attribute set, we derived suboptimal system call space by applying the proposed feature selection method to maximize the separability between malware and benign samples. Comprehensive experiments conducted on a dataset consisting of 3500 samples with 30 RSST derived essential system calls resulted in an accuracy of 99.9%, Area Under Curve (AUC) of 1.0, with 1% False Positive Rate (FPR). However, other feature selectors (Information Gain, CFsSubsetEval, ChiSquare, FreqSel and Symmetric Uncertainty) used in the domain of malware analysis resulted in the accuracy of 95.5% with 8.5% FPR. Besides, empirical analysis of RSST derived system calls outperform other attributes such as permissions, opcodes, API, methods, call graphs, Droidbox attributes and network traces.

CVDec 13, 2015
Deep Learning-Based Image Kernel for Inductive Transfer

Neeraj Kumar, Animesh Karmakar, Ranti Dev Sharma et al.

We propose a method to classify images from target classes with a small number of training examples based on transfer learning from non-target classes. Without using any more information than class labels for samples from non-target classes, we train a Siamese net to estimate the probability of two images to belong to the same class. With some post-processing, output of the Siamese net can be used to form a gram matrix of a Mercer kernel. Coupled with a support vector machine (SVM), such a kernel gave reasonable classification accuracy on target classes without any fine-tuning. When the Siamese net was only partially fine-tuned using a small number of samples from the target classes, the resulting classifier outperformed the state-of-the-art and other alternatives. We share class separation capabilities and insights into the learning process of such a kernel on MNIST, Dogs vs. Cats, and CIFAR-10 datasets.

DMMar 3, 2015
DAG-width of Control Flow Graphs with Applications to Model Checking

Therese Biedl, Sebastian Fischmeister, Neeraj Kumar

The treewidth of control flow graphs arising from structured programs is known to be at most six. However, as a control flow graph is inherently directed, it makes sense to consider a measure of width for digraphs instead. We use the so-called DAG-width and show that the DAG-width of control flow graphs arising from structured (goto-free) programs is at most three. Additionally, we also give a linear time algorithm to compute the DAG decomposition of these control flow graphs. One consequence of this result is that parity games (and hence the $μ$-calculus model checking problem), which are known to be tractable on graphs of bounded DAG-width, can be solved efficiently in practice on control flow graphs.