Francois Vandenhende

CL
h-index2
5papers
5citations
Novelty49%
AI Score48

5 Papers

CLDec 8, 2025
Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map

Francois Vandenhende, Anna Georgiou, Michalis Georgiou et al.

In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal detection. We present a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries), restricted to valid MedDRA PTs. Precision, recall and F1 were computed across similarity-thresholds. High recall (>95%) is achieved at moderate thresholds. Higher thresholds improve precision (up to 86%). The optimal threshold (~0.70 - 0.75) yielded recall ~50% and precision ~33%. Narrow-term PT subsets performed similarly but required slightly higher similarity thresholds. The SafeTerm AI-driven system provides a viable supplementary method for automated MedDRA query generation. A similarity threshold of ~0.60 is recommended initially, with increased thresholds for refined term selection.

CLDec 7, 2025
Automated PRO-CTCAE Symptom Selection based on Prior Adverse Event Profiles

Francois Vandenhende, Anna Georgiou, Michalis Georgiou et al.

The PRO-CTCAE is an NCI-developed patient-reported outcome system for capturing symptomatic adverse events in oncology trials. It comprises a large library drawn from the CTCAE vocabulary, and item selection for a given trial is typically guided by expected toxicity profiles from prior data. Selecting too many PRO-CTCAE items can burden patients and reduce compliance, while too few may miss important safety signals. We present an automated method to select a minimal yet comprehensive PRO-CTCAE subset based on historical safety data. Each candidate PRO-CTCAE symptom term is first mapped to its corresponding MedDRA Preferred Terms (PTs), which are then encoded into Safeterm, a high-dimensional semantic space capturing clinical and contextual diversity in MedDRA terminology. We score each candidate PRO item for relevance to the historical list of adverse event PTs and combine relevance and incidence into a utility function. Spectral analysis is then applied to the combined utility and diversity matrix to identify an orthogonal set of medical concepts that balances relevance and diversity. Symptoms are rank-ordered by importance, and a cut-off is suggested based on the explained information. The tool is implemented as part of the Safeterm trial-safety app. We evaluate its performance using simulations and oncology case studies in which PRO-CTCAE was employed. This automated approach can streamline PRO-CTCAE design by leveraging MedDRA semantics and historical data, providing an objective and reproducible method to balance signal coverage against patient burden.

CLDec 8, 2025
Performance of the SafeTerm AI-Based MedDRA Query System Against Standardised MedDRA Queries

Francois Vandenhende, Anna Georgiou, Michalis Georgiou et al.

In pre-market drug safety review, grouping related adverse event terms into SMQs or OCMQs is critical for signal detection. We assess the performance of SafeTerm Automated Medical Query (AMQ) on MedDRA SMQs. The AMQ is a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score (0-1) using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity, and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against tier-1 SMQs (110 queries, v28.1). Precision, recall and F1 were computed at multiple similarity-thresholds, defined either manually or using an automated method. High recall (94%)) is achieved at moderate similarity thresholds, indicative of good retrieval sensitivity. Higher thresholds filter out more terms, resulting in improved precision (up to 89%). The optimal threshold (0.70)) yielded an overall recall of (48%) and precision of (45%) across all 110 queries. Restricting to narrow-term PTs achieved slightly better performance at an increased (+0.05) similarity threshold, confirming increased relatedness of narrow versus broad terms. The automatic threshold (0.66) selection prioritizes recall (0.58) to precision (0.29). SafeTerm AMQ achieves comparable, satisfactory performance on SMQs and sanitized OCMQs. It is therefore a viable supplementary method for automated MedDRA query generation, balancing recall and precision. We recommend using suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimise recall. Increasing similarity scores allows refined, narrow terms selection.

CLFeb 23
SHIELD: Semantic Heterogeneity Integrated Embedding for Latent Discovery in Clinical Trial Safety Signals

Francois Vandenhende, Anna Georgiou, Theodoros Psaras et al.

We present SHIELD, a novel methodology for automated and integrated safety signal detection in clinical trials. SHIELD combines disproportionality analysis with semantic clustering of adverse event (AE) terms applied to MedDRA term embeddings. For each AE, the pipeline computes an information-theoretic disproportionality measure (Information Component) with effect size derived via empirical Bayesian shrinkage. A utility matrix is constructed by weighting semantic term-term similarities by signal magnitude, followed by spectral embedding and clustering to identify groups of related AEs. Resulting clusters are annotated with syndrome-level summary labels using large language models, yielding a coherent, data-driven representation of treatment-associated safety profiles in the form of a network graph and hierarchical tree. We implement the SHIELD framework in the context of a single-arm incidence summary, to compare two treatment arms or for the detection of any treatment effect in a multi-arm trial. We illustrate its ability to recover known safety signals and generate interpretable, cluster-based summaries in a real clinical trial example. This work bridges statistical signal detection with modern natural language processing to enhance safety assessment and causal interpretation in clinical trials.

CLNov 24, 2025
Knowledge-based Graphical Method for Safety Signal Detection in Clinical Trials

Francois Vandenhende, Anna Georgiou, Michalis Georgiou et al.

We present a graphical, knowledge-based method for reviewing treatment-emergent adverse events (AEs) in clinical trials. The approach enhances MedDRA by adding a hidden medical knowledge layer (Safeterm) that captures semantic relationships between terms in a 2-D map. Using this layer, AE Preferred Terms can be regrouped automatically into similarity clusters, and their association to the trial disease may be quantified. The Safeterm map is available online and connected to aggregated AE incidence tables from ClinicalTrials.gov. For signal detection, we compute treatment-specific disproportionality metrics using shrinkage incidence ratios. Cluster-level EBGM values are then derived through precision-weighted aggregation. Two visual outputs support interpretation: a semantic map showing AE incidence and an expectedness-versus-disproportionality plot for rapid signal detection. Applied to three legacy trials, the automated method clearly recovers all expected safety signals. Overall, augmenting MedDRA with a medical knowledge layer improves clarity, efficiency, and accuracy in AE interpretation for clinical trials.