CLAILGDec 9, 2023

Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data

DeepMind
arXiv:2312.11502v29 citationsh-index: 7ML4H@NeurIPS
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of using pre-trained models for healthcare data analysis, but it is incremental as it highlights limitations rather than breakthroughs.

The authors tackled the problem of applying masked language modeling to laboratory data from electronic health records, finding that both Labrador and BERT pre-trained on 100 million lab results did not consistently outperform XGBoost on downstream prediction tasks, with transfer learning showing limited or marginal success.

In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes