CLIRApr 14, 2022

Multi-label topic classification for COVID-19 literature with Bioformer

arXiv:2204.06758v14 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This work addresses topic classification for COVID-19 literature, which is incremental as it applies existing methods to a specific domain.

The paper tackled multi-label topic classification for COVID-19 literature by comparing BERT models, finding that Bioformer outperformed BioBERT and PubMedBERT with improvements of 8.8%, 15.5%, and 7.4% in F1 scores.

We describe Bioformer team's participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes