CLAug 5, 2024

BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba

Ling Yue, Sixue Xing, Yingzhou Lu, Tianfan Fu

arXiv:2408.02600v17.212 citationsh-index: 10Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of domain-specific language processing for biomedical researchers, representing an incremental improvement by adapting an existing architecture to a specialized domain.

The paper tackles the challenge of interpreting complex biomedical literature by introducing BioMamba, a pre-trained model based on the Mamba architecture, which achieves a 100 times reduction in perplexity and a 4 times reduction in cross-entropy loss on the BioASQ test set compared to models like BioBERT and general-domain Mamba.

The advancement of natural language processing (NLP) in biology hinges on models' ability to interpret intricate biomedical literature. Traditional models often struggle with the complex and domain-specific language in this field. In this paper, we present BioMamba, a pre-trained model specifically designed for biomedical text mining. BioMamba builds upon the Mamba architecture and is pre-trained on an extensive corpus of biomedical literature. Our empirical studies demonstrate that BioMamba significantly outperforms models like BioBERT and general-domain Mamba across various biomedical tasks. For instance, BioMamba achieves a 100 times reduction in perplexity and a 4 times reduction in cross-entropy loss on the BioASQ test set. We provide an overview of the model architecture, pre-training process, and fine-tuning techniques. Additionally, we release the code and trained model to facilitate further research.

View on arXiv PDF Code

Similar