31.4CLMay 28Code
Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflowAhmed Abdeen Hamed, Luis M. Rocha
We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.
CLAug 15, 2023
Detection of ChatGPT Fake Science with the xFakeSci Learning AlgorithmAhmed Abdeen Hamed, Xindong Wu
Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that ``AI generated content may exhibit a distinctive behavior that can be separated from scientific articles''. In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer's. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80% to 94%, outperforming common data mining algorithms, which scored F1 values between 38% and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.
AIAug 7, 2023
Fact-Checking Generative AI: Ontology-Driven Biological Graphs for Disease-Gene Link VerificationAhmed Abdeen Hamed, Byung Suk Lee, Alessandro Crimi et al.
Since the launch of various generative AI tools, scientists have been striving to evaluate their capabilities and contents, in the hope of establishing trust in their generative abilities. Regulations and guidelines are emerging to verify generated contents and identify novel uses. we aspire to demonstrate how ChatGPT claims are checked computationally using the rigor of network models. We aim to achieve fact-checking of the knowledge embedded in biological graphs that were contrived from ChatGPT contents at the aggregate level. We adopted a biological networks approach that enables the systematic interrogation of ChatGPT's linked entities. We designed an ontology-driven fact-checking algorithm that compares biological graphs constructed from approximately 200,000 PubMed abstracts with counterparts constructed from a dataset generated using the ChatGPT-3.5 Turbo model. In 10-samples of 250 randomly selected records a ChatGPT dataset of 1000 "simulated" articles , the fact-checking link accuracy ranged from 70% to 86%. This study demonstrated high accuracy of aggregate disease-gene links relationships found in ChatGPT-generated texts.
AIFeb 20, 2025
From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPTAhmed Abdeen Hamed, Alessandro Crimi, Magdalena M. Misiak et al.
The generative capabilities of LLM models offer opportunities for accelerating tasks but raise concerns about the authenticity of the knowledge they produce. To address these concerns, we present a computational approach that evaluates the factual accuracy of biomedical knowledge generated by an LLM. Our approach consists of two processes: generating disease-centric associations and verifying these associations using the semantic framework of biomedical ontologies. Using ChatGPT as the selected LLM, we designed prompt-engineering processes to establish linkages between diseases and related drugs, symptoms, and genes, and assessed consistency across multiple ChatGPT models (e.g., GPT-turbo, GPT-4, etc.). Experimental results demonstrate high accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and genetic information (88%-98%). However, symptom term identification was notably lower (49%-61%), due to the informal and verbose nature of symptom descriptions, which hindered effective semantic matching with the formal language of specialized ontologies. Verification of associations reveals literature coverage rates of 89%-91% for disease-drug and disease-gene pairs, while symptom-related associations exhibit lower coverage (49%-62%).
CLApr 21, 2024
Reinforcement of Explainability of ChatGPT Prompts by Embedding Breast Cancer Self-Screening Rules into AI ResponsesYousef Khan, Ahmed Abdeen Hamed
Addressing the global challenge of breast cancer, this research explores the fusion of generative AI, focusing on ChatGPT 3.5 turbo model, and the intricacies of breast cancer risk assessment. The research aims to evaluate ChatGPT's reasoning capabilities, emphasizing its potential to process rules and provide explanations for screening recommendations. The study seeks to bridge the technology gap between intelligent machines and clinicians by demonstrating ChatGPT's unique proficiency in natural language reasoning. The methodology employs a supervised prompt-engineering approach to enforce detailed explanations for ChatGPT's recommendations. Synthetic use cases, generated algorithmically, serve as the testing ground for the encoded rules, evaluating the model's processing prowess. Findings highlight ChatGPT's promising capacity in processing rules comparable to Expert System Shells, with a focus on natural language reasoning. The research introduces the concept of reinforcement explainability, showcasing its potential in elucidating outcomes and facilitating user-friendly interfaces for breast cancer risk assessment.
AIJun 18, 2024
Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast CancerAhmed Abdeen Hamed, Tamer E. Fandy
The objective of this research is to introduce a network specialized in predicting drugs that can be repurposed by investigating real-world evidence sources, such as clinical trials and biomedical literature. Specifically, it aims to generate drug combination therapies for complex diseases (e.g., cancer, Alzheimer's). We present a multilayered network medicine approach, empowered by a highly configured ChatGPT prompt engineering system, which is constructed on the fly to extract drug mentions in clinical trials. Additionally, we introduce a novel algorithm that connects real-world evidence with disease-specific signaling pathways (e.g., KEGG database). This sheds light on the repurposability of drugs if they are found to bind with one or more protein constituents of a signaling pathway. To demonstrate, we instantiated the framework for breast cancer and found that, out of 46 breast cancer signaling pathways, the framework identified 38 pathways that were covered by at least two drugs. This evidence signals the potential for combining those drugs. Specifically, the most covered signaling pathway, ID hsa:2064, was covered by 108 drugs, some of which can be combined. Conversely, the signaling pathway ID hsa:1499 was covered by only two drugs, indicating a significant gap for further research. Our network medicine framework, empowered by GenAI, shows promise in identifying drug combinations with a high degree of specificity, knowing the exact signaling pathways and proteins that serve as targets. It is noteworthy that ChatGPT successfully accelerated the process of identifying drug mentions in clinical trials, though further investigations are required to determine the relationships among the drug mentions.
CLJan 19, 2024
Quantifying Similarity: Text-Mining Approaches to Evaluate ChatGPT and Google Bard Content in Relation to BioMedical LiteratureJakub Klimczak, Ahmed Abdeen Hamed
Background: The emergence of generative AI tools, empowered by Large Language Models (LLMs), has shown powerful capabilities in generating content. To date, the assessment of the usefulness of such content, generated by what is known as prompt engineering, has become an interesting research question. Objectives Using the mean of prompt engineering, we assess the similarity and closeness of such contents to real literature produced by scientists. Methods In this exploratory analysis, (1) we prompt-engineer ChatGPT and Google Bard to generate clinical content to be compared with literature counterparts, (2) we assess the similarities of the contents generated by comparing them with counterparts from biomedical literature. Our approach is to use text-mining approaches to compare documents and associated bigrams and to use network analysis to assess the terms' centrality. Results The experiments demonstrated that ChatGPT outperformed Google Bard in cosine document similarity (38% to 34%), Jaccard document similarity (23% to 19%), TF-IDF bigram similarity (47% to 41%), and term network centrality (degree and closeness). We also found new links that emerged in ChatGPT bigram networks that did not exist in literature bigram networks. Conclusions: The obtained similarity results show that ChatGPT outperformed Google Bard in document similarity, bigrams, and degree and closeness centrality. We also observed that ChatGPT offers linkage to terms that are connected in the literature. Such connections could inspire asking interesting questions and generate new hypotheses.