Question Answering based Clinical Text Structuring Using Pre-trained Language Model
This work addresses dataset scarcity and error propagation in clinical text structuring for medical research, though it is incremental as it builds on existing pre-trained models.
The paper tackles clinical text structuring by proposing a question answering-based approach to unify tasks and share datasets, and introduces a model incorporating domain-specific features into a pre-trained language model, showing effectiveness on Chinese pathology reports with competitive performance against baselines.
Clinical text structuring is a critical and fundamental task for clinical research. Traditional methods such as taskspecific end-to-end models and pipeline models usually suffer from the lack of dataset and error propagation. In this paper, we present a question answering based clinical text structuring (QA-CTS) task to unify different specific tasks and make dataset shareable. A novel model that aims to introduce domain-specific features (e.g., clinical named entity information) into pre-trained language model is also proposed for QA-CTS task. Experimental results on Chinese pathology reports collected from Ruijing Hospital demonstrate our presented QA-CTS task is very effective to improve the performance on specific tasks. Our proposed model also competes favorably with strong baseline models in specific tasks.