CLAug 25, 2018

Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition

arXiv:1808.08450v11094 citations
Originality Synthesis-oriented
AI Analysis

This work addresses chemical and disease NER for biomedical text analysis, but it is incremental as it compares existing embedding methods without introducing new techniques.

The study compared CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition, finding both achieve comparable state-of-the-art performance on the BioCreative V CDR corpus, with CNN embeddings offering a computational advantage by increasing training time by 25% compared to LSTM embeddings that more than double it.

We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks. Empirical results over the BioCreative V CDR corpus show that the use of either type of character-level word embeddings in conjunction with the BiLSTM-CRF models leads to comparable state-of-the-art performance. However, the models using CNN-based character-level word embeddings have a computational performance advantage, increasing training time over word-based models by 25% while the LSTM-based character-level word embeddings more than double the required training time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes