CLAIAug 28, 2023

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

arXiv:2308.14669v15 citationsh-index: 12Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of NER for low-resource languages like Arabic and Arabizi, providing a user-friendly web tool and model deployment, but it is incremental as it builds on existing BERT methods.

The paper tackles Named Entity Recognition (NER) for Arabic and Arabizi languages by developing ANER, a transformer-based model that achieves an F1 score of 88.7% on the ANERcorp dataset, outperforming CAMeL Tools' 83%, and 77.7% on out-of-domain news data.

One of the main tasks of Natural Language Processing (NLP), is Named Entity Recognition (NER). It is used in many applications and also can be used as an intermediate step for other tasks. We present ANER, a web-based named entity recognizer for the Arabic, and Arabizi languages. The model is built upon BERT, which is a transformer-based encoder. It can recognize 50 different entity classes, covering various fields. We trained our model on the WikiFANE\_Gold dataset which consists of Wikipedia articles. We achieved an F1 score of 88.7\%, which beats CAMeL Tools' F1 score of 83\% on the ANERcorp dataset, which has only 4 classes. We also got an F1 score of 77.7\% on the NewsFANE\_Gold dataset which contains out-of-domain data from News articles. The system is deployed on a user-friendly web interface that accepts users' inputs in Arabic, or Arabizi. It allows users to explore the entities in the text by highlighting them. It can also direct users to get information about entities through Wikipedia directly. We added the ability to do NER using our model, or CAMeL Tools' model through our website. ANER is publicly accessible at \url{http://www.aner.online}. We also deployed our model on HuggingFace at https://huggingface.co/boda/ANER, to allow developers to test and use it.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes