A Medical Information Extraction Workbench to Process German Clinical Text
This work addresses the problem of limited resources for clinical text processing in non-English languages, specifically German, by providing a publicly available tool for researchers and practitioners, though it is incremental as it builds on existing methods for a new dataset.
The authors tackled the scarcity of accessible datasets and tools for processing German clinical text by introducing a workbench of models trained on de-identified German nephrology reports, achieving promising results on in-domain data and demonstrating applicability to other biomedical text in German.
Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.