Khader Shameer

h-index38

3papers

4citations

Novelty28%

AI Score18

Ranked #189,854 of 194,257 authors (top 98%)#58,538 in CV (top 99%)

3 Papers

1.2MNNov 29, 2023

Generation of a Compendium of Transcription Factor Cascades and Identification of Potential Therapeutic Targets using Graph Machine Learning

Sonish Sivarajkumar, Pratyush Tandale, Ankit Bhardwaj et al.

Transcription factors (TFs) play a vital role in the regulation of gene expression thereby making them critical to many cellular processes. In this study, we used graph machine learning methods to create a compendium of TF cascades using data extracted from the STRING database. A TF cascade is a sequence of TFs that regulate each other, forming a directed path in the TF network. We constructed a knowledge graph of 81,488 unique TF cascades, with the longest cascade consisting of 62 TFs. Our results highlight the complex and intricate nature of TF interactions, where multiple TFs work together to regulate gene expression. We also identified 10 TFs with the highest regulatory influence based on centrality measurements, providing valuable information for researchers interested in studying specific TFs. Furthermore, our pathway enrichment analysis revealed significant enrichment of various pathways and functional categories, including those involved in cancer and other diseases, as well as those involved in development, differentiation, and cell signaling. The enriched pathways identified in this study may have potential as targets for therapeutic intervention in diseases associated with dysregulation of transcription factors. We have released the dataset, knowledge graph, and graphML methods for the TF cascades, and created a website to display the results, which can be accessed by researchers interested in using this dataset. Our study provides a valuable resource for understanding the complex network of interactions between TFs and their regulatory roles in cellular processes.

3.6MLDec 15, 2021

TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials

Christopher Yacoumatos, Stefano Bragaglia, Anshul Kanakia et al.

A major impediment to successful drug development is the complexity, cost, and scale of clinical trials. The detailed internal structure of clinical trial data can make conventional optimization difficult to achieve. Recent advances in machine learning, specifically graph-structured data analysis, have the potential to enable significant progress in improving the clinical trial design. TrialGraph seeks to apply these methodologies to produce a proof-of-concept framework for developing models which can aid drug development and benefit patients. In this work, we first introduce a curated clinical trial data set compiled from the CT.gov, AACT and TrialTrove databases (n=1191 trials; representing one million patients) and describe the conversion of this data to graph-structured formats. We then detail the mathematical basis and implementation of a selection of graph machine learning algorithms, which typically use standard machine classifiers on graph data embedded in a low-dimensional feature space. We trained these models to predict side effect information for a clinical trial given information on the disease, existing medical conditions, and treatment. The MetaPath2Vec algorithm performed exceptionally well, with standard Logistic Regression, Decision Tree, Random Forest, Support Vector, and Neural Network classifiers exhibiting typical ROC-AUC scores of 0.85, 0.68, 0.86, 0.80, and 0.77, respectively. Remarkably, the best performing classifiers could only produce typical ROC-AUC scores of 0.70 when trained on equivalent array-structured data. Our work demonstrates that graph modelling can significantly improve prediction accuracy on appropriate datasets. Successive versions of the project that refine modelling assumptions and incorporate more data types can produce excellent predictors with real-world applications in drug development.

0.9CVJun 30, 2018

Mammography Assessment using Multi-Scale Deep Classifiers

Ulzee An, Khader Shameer, Lakshmi Subramanian

Applying deep learning methods to mammography assessment has remained a challenging topic. Dense noise with sparse expressions, mega-pixel raw data resolution, lack of diverse examples have all been factors affecting performance. The lack of pixel-level ground truths have especially limited segmentation methods in pushing beyond approximately bounding regions. We propose a classification approach grounded in high performance tissue assessment as an alternative to all-in-one localization and assessment models that is also capable of pinpointing the causal pixels. First, the objective of the mammography assessment task is formalized in the context of local tissue classifiers. Then, the accuracy of a convolutional neural net is evaluated on classifying patches of tissue with suspicious findings at varying scales, where highest obtained AUC is above $0.9$. The local evaluations of one such expert tissue classifier is used to augment the results of a heatmap regression model and additionally recover the exact causal regions at high resolution as a saliency image suitable for clinical settings.