Omar Darwish

h-index19

4papers

17citations

Novelty46%

AI Score32

Ranked #138,558 of 201,326 authors (top 69%)#30,244 in LG (top 71%)

4 Papers

CLJan 6, 2025

Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text

Ayat Najjar, Huthaifa I. Ashqar, Omar Darwish et al.

The development of Generative AI Large Language Models (LLMs) raised the alarm regarding identifying content produced through generative AI or humans. In one case, issues arise when students heavily rely on such tools in a manner that can affect the development of their writing or coding skills. Other issues of plagiarism also apply. This study aims to support efforts to detect and identify textual content generated using LLM tools. We hypothesize that LLMs-generated text is detectable by machine learning (ML), and investigate ML models that can recognize and differentiate texts generated by multiple LLMs tools. We leverage several ML and Deep Learning (DL) algorithms such as Random Forest (RF), and Recurrent Neural Networks (RNN), and utilized Explainable Artificial Intelligence (XAI) to understand the important features in attribution. Our method is divided into 1) binary classification to differentiate between human-written and AI-text, and 2) multi classification, to differentiate between human-written text and the text generated by the five different LLM tools (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity). Results show high accuracy in the multi and binary classification. Our model outperformed GPTZero with 98.5\% accuracy to 78.3\%. Notably, GPTZero was unable to recognize about 4.2\% of the observations, but our model was able to recognize the complete test dataset. XAI results showed that understanding feature importance across different classes enables detailed author/source profiles. Further, aiding in attribution and supporting plagiarism detection by highlighting unique stylistic and structural elements ensuring robust content originality verification.

CVSep 7, 2025

Near Real-Time Dust Aerosol Detection with 3D Convolutional Neural Networks on MODIS Data

Caleb Gates, Patrick Moorhead, Jayden Ferguson et al.

Dust storms harm health and reduce visibility; quick detection from satellites is needed. We present a near real-time system that flags dust at the pixel level using multi-band images from NASA's Terra and Aqua (MODIS). A 3D convolutional network learns patterns across all 36 bands, plus split thermal bands, to separate dust from clouds and surface features. Simple normalization and local filling handle missing data. An improved version raises training speed by 21x and supports fast processing of full scenes. On 17 independent MODIS scenes, the model reaches about 0.92 accuracy with a mean squared error of 0.014. Maps show strong agreement in plume cores, with most misses along edges. These results show that joint band-and-space learning can provide timely dust alerts at global scale; using wider input windows or attention-based models may further sharpen edges.

LGDec 6, 2019

Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons et al.

The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiation of these variants from sequencing error poses a classic classification problem, which has traditionally been approached with Bayesian statistics, and more recently with supervised machine learning methods such as neural networks. Although these methods provide greater accuracy, classic neural networks lack the ability to indicate the confidence of a variant call. In this paper, we explore the performance of deep Bayesian neural networks on next generation sequencing data, and their ability to give probability estimates for somatic variant calls. In addition to demonstrating similar performance in comparison to standard neural networks, we show that the resultant output probabilities make these better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world. We aim to deliver algorithms to oncologists for which model certainty better reflects accuracy, for improved clinical application. By moving away from point estimates to reliable confidence intervals, we expect the resultant clinical and treatment decisions to be more robust and more informed by the underlying reality of the tumor molecular profile.

LGDec 4, 2019

Safety and Robustness in Decision Making: Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons et al.

The genomic profile underlying an individual tumor can be highly informative in the creation of a personalized cancer treatment strategy for a given patient; a practice known as precision oncology. This involves next generation sequencing of a tumor sample and the subsequent identification of genomic aberrations, such as somatic mutations, to provide potential candidates of targeted therapy. The identification of these aberrations from sequencing noise and germline variant background poses a classic classification-style problem. This has been previously broached with many different supervised machine learning methods, including deep-learning neural networks. However, these neural networks have thus far not been tailored to give any indication of confidence in the mutation call, meaning an oncologist could be targeting a mutation with a low probability of being true. To address this, we present here a deep bayesian recurrent neural network for cancer variant calling, which shows no degradation in performance compared to standard neural networks. This approach enables greater flexibility through different priors to avoid overfitting to a single dataset. We will be incorporating this approach into software for oncologists to obtain safe, robust, and statistically confident somatic mutation calls for precision oncology treatment choices.