May I Check Again? -- A simple but efficient way to generate and use contextual dictionaries for Named Entity Recognition. Application to French Legal Texts
This work addresses a domain-specific issue of typo robustness in named entity recognition for French legal documents, offering an incremental improvement over existing methods.
The paper tackles the problem of recognizing named entities with typos in French legal texts by using contextual dictionaries to enrich neural network layers, resulting in a 32% reduction in relative F1-score error and improving the score from 94.85% to 96.52%.
In this paper we present a new method to learn a model robust to typos for a Named Entity Recognition task. Our improvement over existing methods helps the model to take into account the context of the sentence inside a court decision in order to recognize an entity with a typo. We used state-of-the-art models and enriched the last layer of the neural network with high-level information linked with the potential of the word to be a certain type of entity. More precisely, we utilized the similarities between the word and the potential entity candidates in the tagged sentence context. The experiments on a dataset of French court decisions show a reduction of the relative F1-score error of 32%, upgrading the score obtained with the most competitive fine-tuned state-of-the-art system from 94.85% to 96.52%.