CLLGFeb 20, 2020

Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

arXiv:2002.08562v138 citations
AI Analysis

This addresses privacy and regulatory barriers in healthcare NLP by enabling collaborative model training across data silos.

The paper tackled the challenge of training BERT models on clinical text data from multiple institutions without sharing data due to privacy concerns, by demonstrating successful federated pretraining and fine-tuning.

Large scale contextual representation models, such as BERT, have significantly advanced natural language processing (NLP) in recently years. However, in certain area like healthcare, accessing diverse large scale text data from multiple institutions is extremely challenging due to privacy and regulatory reasons. In this article, we show that it is possible to both pretrain and fine tune BERT models in a federated manner using clinical texts from different silos without moving the data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes