CLOct 25, 2019

Evaluation of Sentence Representations in Polish

arXiv:1910.11834v21002 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of resources for low-resource languages like Polish, enabling better evaluation of sentence embeddings for language-specific tasks, though it is incremental as it focuses on benchmarking existing methods.

The authors tackled the problem of evaluating sentence representations in Polish by introducing two new datasets and comprehensively assessing eight methods, including Polish and multilingual models, to identify their strengths and weaknesses.

Methods for learning sentence representations have been actively developed in recent years. However, the lack of pre-trained models and datasets annotated at the sentence level has been a problem for low-resource languages such as Polish which led to less interest in applying these methods to language-specific tasks. In this study, we introduce two new Polish datasets for evaluating sentence embeddings and provide a comprehensive evaluation of eight sentence representation methods including Polish and multilingual models. We consider classic word embedding models, recently developed contextual embeddings and multilingual sentence encoders, showing strengths and weaknesses of specific approaches. We also examine different methods of aggregating word vectors into a single sentence vector.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes