Spanish Biomedical and Clinical Language Embeddings
This work provides improved embeddings for Spanish biomedical and clinical text, which is an incremental advancement for natural language processing in this domain.
The researchers computed Spanish biomedical and clinical language embeddings using FastText with Byte Pair Encoding for sub-word representations, and found that their biomedical word embeddings outperformed previous versions, demonstrating that more data leads to better representations.
We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.