CL AIMar 18, 2025

Language Independent Named Entity Recognition via Orthogonal Transformation of Word Vectors

arXiv:2503.14755v1h-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of language dependency in NLP for researchers and practitioners, though it is incremental as it builds on existing methods like BiLSTM/CRF and embedding transformations.

The paper tackles cross-lingual named entity recognition by training a model on English and using an orthogonal transformation to adapt word embeddings for Arabic, achieving detection without additional training or fine-tuning on Arabic data.

Word embeddings have been a key building block for NLP in which models relied heavily on word embeddings in many different tasks. In this paper, a model is proposed based on using Bidirectional LSTM/CRF with word embeddings to perform named entity recognition for any language. This is done by training a model on a source language (English) and transforming word embeddings from the target language into word embeddings of the source language by using an orthogonal linear transformation matrix. Evaluation of the model shows that by training a model on an English dataset the model was capable of detecting named entities in an Arabic dataset without neither training or fine tuning the model on an Arabic language dataset.

View on arXiv PDF

Similar