AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
This work addresses the problem of catastrophic forgetting for researchers and practitioners building Chinese biomedical language models, representing an incremental improvement over existing continual pretraining methods.
The paper tackles catastrophic forgetting in continual pretraining for domain-specific language models by proposing the AF Adapter, which introduces small modifications to attention heads and hidden units in BERT-based models. The method achieves an average performance gain of 0.6% to 2% with only about 17% of parameters trained and reduces catastrophic forgetting by 11% compared to fine-tuning.
Continual pretraining is a popular way of building a domain-specific pretrained language model from a general-domain language model. In spite of its high efficiency, continual pretraining suffers from catastrophic forgetting, which may harm the model's performance in downstream tasks. To alleviate the issue, in this paper, we propose a continual pretraining method for the BERT-based model, named Attention-FFN Adapter. Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Furthermore, we train a domain-specific language model named AF Adapter based RoBERTa for the Chinese biomedical domain. In experiments, models are applied to downstream tasks for evaluation. The results demonstrate that with only about 17% of model parameters trained, AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong baselines. Further experimental results show that our method alleviates the catastrophic forgetting problem by 11% compared to the fine-tuning method.