Applications of BERT Based Sequence Tagging Models on Chinese Medical Text Attributes Extraction
This work addresses attribute extraction in Chinese medical texts, which is an incremental improvement using existing methods on domain-specific data.
The paper tackles Chinese medical text attribute extraction by converting it into sequence tagging or machine reading comprehension tasks, achieving good performance on the CCKS 2019 benchmark with an ensemble of BERT-based models including LSTM-CRF, CNN, and others.
We convert the Chinese medical text attributes extraction task into a sequence tagging or machine reading comprehension task. Based on BERT pre-trained models, we have not only tried the widely used LSTM-CRF sequence tagging model, but also other sequence models, such as CNN, UCNN, WaveNet, SelfAttention, etc, which reaches similar performance as LSTM+CRF. This sheds a light on the traditional sequence tagging models. Since the aspect of emphasis for different sequence tagging models varies substantially, ensembling these models adds diversity to the final system. By doing so, our system achieves good performance on the task of Chinese medical text attributes extraction (subtask 2 of CCKS 2019 task 1).