Concordance Comparison as a Means of Assembling Local Grammars
This work provides a method for improving named entity recognition in Portuguese by combining local grammars, offering a practical tool for linguists and NLP practitioners.
The authors propose using concordance comparison to assemble local grammars for person name extraction, achieving an F-measure of 76.86 on the Second HAREM Portuguese dataset, a 6-point gain over the state-of-the-art.
Named Entity Recognition for person names is an important but non-trivial task in information extraction. This article uses a tool that compares the concordances obtained from two local grammars (LG) and highlights the differences. We used the results as an aid to select the best of a set of LGs. By analyzing the comparisons, we observed relationships of inclusion, intersection and disjunction within each pair of LGs, which helped us to assemble those that yielded the best results. This approach was used in a case study on extraction of person names from texts written in Portuguese. We applied the enhanced grammar to the Gold Collection of the Second HAREM. The F-Measure obtained was 76.86, representing a gain of 6 points in relation to the state-of-the-art for Portuguese.