LGNov 9, 2023

Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?

arXiv:2311.05473v11 citationsh-index: 13
AI Analysis

This work addresses the challenge of improving outlier detection in healthcare data, though it is incremental as it builds on existing methods with a negative result on meta-learning.

The study evaluated six machine learning outlier detection algorithms on 838 datasets from multi-center randomized controlled trials, finding that while at least one algorithm performed well 70.6% of the time, no single algorithm was consistently effective. They proposed the Meta-learned Probabilistic Ensemble (MePE) for aggregating predictions, but found that small ensembles outperformed meta-learning approaches on average.

Modern multi-centre randomized controlled trials (MCRCTs) collect massive amounts of tabular data, and are monitored intensively for irregularities by humans. We began by empirically evaluating 6 modern machine learning-based outlier detection algorithms on the task of identifying irregular data in 838 datasets from 7 real-world MCRCTs with a total of 77,001 patients from over 44 countries. Our results reinforce key findings from prior work in the outlier detection literature on data from other domains. Existing algorithms often succeed at identifying irregularities without any supervision, with at least one algorithm exhibiting positive performance 70.6% of the time. However, performance across datasets varies substantially with no single algorithm performing consistently well, motivating new techniques for unsupervised model selection or other means of aggregating potentially discordant predictions from multiple candidate models. We propose the Meta-learned Probabilistic Ensemble (MePE), a simple algorithm for aggregating the predictions of multiple unsupervised models, and show that it performs favourably compared to recent meta-learning approaches for outlier detection model selection. While meta-learning shows promise, small ensembles outperform all forms of meta-learning on average, a negative result that may guide the application of current outlier detection approaches in healthcare and other real-world domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes