CL LGNov 9, 2022

Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language

Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro

arXiv:2211.05035v10.31 citationsh-index: 22

Originality Incremental advance

AI Analysis

This addresses the niche problem of medical NLP for Italian, which lacks medical texts and controlled vocabularies, but is incremental as it combines existing methods.

The paper tackled the lack of medical word embeddings for Italian by combining contrastive learning with knowledge graph embeddings, achieving a significant performance improvement over the starting model while using less data, though it did not outperform multilingual state-of-the-art models.

Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data.

View on arXiv PDF

Similar