Representing Interlingual Meaning in Lexical Databases
This addresses the problem of biased and incomplete lexical representation for under-represented languages, which is incremental as it evaluates existing databases rather than proposing a new solution.
The paper assessed state-of-the-art multilingual lexical databases and found they have structural limitations that reduce expressivity for culturally-specific words and mapping across languages, particularly under-representing diverse languages compared to dominant ones like English.
In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.