Mor Peleg

h-index38

6papers

25citations

Novelty28%

AI Score29

Ranked #140,878 of 194,257 authors (top 73%)#8,545 in AI (top 68%)

6 Papers

6.7AINov 19, 2023

Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

Gongbo Zhang, Qiao Jin, Denis Jered McInerney et al. · amazon-science, salesforce

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

5.3LGApr 6, 2023

Personalising Digital Health Behaviour Change Interventions using Machine Learning and Domain Knowledge

Aneta Lisowska, Szymon Wilk, Mor Peleg

We are developing a virtual coaching system that helps patients adhere to behavior change interventions (BCI). Our proposed system predicts whether a patient will perform the targeted behaviour and uses counterfactual examples with feature control to guide personalisation of BCI. We use simulated patient data with varying levels of receptivity to intervention to arrive at the study design which would enable evaluation of our system.

1.3CLNov 12, 2023Code

Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?

Antonio Zaitoun, Tomer Sagi, Szymon Wilk et al.

Ontologies play a crucial role in organizing and representing knowledge. However, even current ontologies do not encompass all relevant concepts and relationships. Here, we explore the potential of large language models (LLM) to expand an existing ontology in a semi-automated fashion. We demonstrate our approach on the biomedical ontology SNOMED-CT utilizing semantic relation types from the widely used UMLS semantic network. We propose a method that uses conversational interactions with an LLM to analyze clinical practice guidelines (CPGs) and detect the relationships among the new medical concepts that are not present in SNOMED-CT. Our initial experimentation with the conversational prompts yielded promising preliminary results given a manually generated gold standard, directing our future potential improvements.

3.3AIJul 15, 2025

Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation

Yicong Wu, Ting Chen, Irit Hochberg et al.

Therapy recommendation for chronic patients with multimorbidity is challenging due to risks of treatment conflicts. Existing decision support systems face scalability limitations. Inspired by the way in which general practitioners (GP) manage multimorbidity patients, occasionally convening multidisciplinary team (MDT) collaboration, this study investigated the feasibility and value of using a Large Language Model (LLM)-based multi-agent system (MAS) for safer therapy recommendations. We designed a single agent and a MAS framework simulating MDT decision-making by enabling discussion among LLM agents to resolve medical conflicts. The systems were evaluated on therapy planning tasks for multimorbidity patients using benchmark cases. We compared MAS performance with single-agent approaches and real-world benchmarks. An important contribution of our study is the definition of evaluation metrics that go beyond the technical precision and recall and allow the inspection of clinical goals met and medication burden of the proposed advices to a gold standard benchmark. Our results show that with current LLMs, a single agent GP performs as well as MDTs. The best-scoring models provide correct recommendations that address all clinical goals, yet the advices are incomplete. Some models also present unnecessary medications, resulting in unnecessary conflicts between medication and conditions or drug-drug interactions.

2.7HCMar 18, 2024

Defining Effective Engagement For Enhancing Cancer Patients' Well-being with Mobile Digital Behavior Change Interventions

Aneta Lisowska, Szymon Wilk, Laura Locati et al.

Digital Behavior Change Interventions (DBCIs) are supporting development of new health behaviors. Evaluating their effectiveness is crucial for their improvement and understanding of success factors. However, comprehensive guidance for developers, particularly in small-scale studies with ethical constraints, is limited. Building on the CAPABLE project, this study aims to define effective engagement with DBCIs for supporting cancer patients in enhancing their quality of life. We identify metrics for measuring engagement, explore the interest of both patients and clinicians in DBCIs, and propose hypotheses for assessing the impact of DBCIs in such contexts. Our findings suggest that clinician prescriptions significantly increase sustained engagement with mobile DBCIs. In addition, while one weekly engagement with a DBCI is sufficient to maintain well-being, transitioning from extrinsic to intrinsic motivation may require a higher level of engagement.

0.3CLFeb 28, 2022

Supervised Machine Learning Algorithm for Detecting Consistency between Reported Findings and the Conclusions of Mammography Reports

Alexander Berdichevsky, Mor Peleg, Daniel L. Rubin

Objective. Mammography reports document the diagnosis of patients' conditions. However, many reports contain non-standard terms (non-BI-RADS descriptors) and incomplete statements, which can lead to conclusions that are not well-supported by the reported findings. Our aim was to develop a tool to detect such discrepancies by comparing the reported conclusions to those that would be expected based on the reported radiology findings. Materials and Methods. A deidentified data set from an academic hospital containing 258 mammography reports supplemented by 120 reports found on the web was used for training and evaluation. Spell checking and term normalization was used to unambiguously determine the reported BI-RADS descriptors. The resulting data were input into seven classifiers that classify mammography reports, based on their Findings sections, into seven BI-RADS final assessment categories. Finally, the semantic similarity score of a report to each BI-RADS category is reported. Results. Our term normalization algorithm correctly identified 97% of the BI-RADS descriptors in mammography reports. Our system provided 76% precision and 83% recall in correctly classifying the reports according to BI-RADS final assessment category. Discussion. The strength of our approach relies on providing high importance to BI-RADS terms in the summarization phase, on the semantic similarity that considers the complex data representation, and on the classification into all seven BI-RADs categories. Conclusion. BI-RADS descriptors and expected final assessment categories could be automatically detected by our approach with fairly good accuracy, which could be used to make users aware that their reported findings do not match well with their conclusion.