CL AIFeb 25, 2021

Spanish Biomedical and Clinical Language Embeddings

Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Casimiro Pio Carrino, Ona De Gibert, Aitor Gonzalez-Agirre, Marta Villegas

arXiv:2102.12843v11.03 citations

Originality Synthesis-oriented

AI Analysis

This work provides improved embeddings for Spanish biomedical and clinical text, which is an incremental advancement for natural language processing in this domain.

The researchers computed Spanish biomedical and clinical language embeddings using FastText with Byte Pair Encoding for sub-word representations, and found that their biomedical word embeddings outperformed previous versions, demonstrating that more data leads to better representations.

We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.

View on arXiv PDF

Similar