CLSep 24, 2013

Acronym recognition and processing in 22 languages

arXiv:1309.6185v121 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of entity recognition and cross-language news linking for automated news analysis, but it is incremental as it adapts existing methods to a new domain.

The paper tackled the problem of recognizing and processing acronyms in news articles across 22 languages, adapting patterns from medical terms to the general news domain and providing evaluation results and extensive statistics on frequency and distribution.

We are presenting work on recognising acronyms of the form Long-Form (Short-Form) such as "International Monetary Fund (IMF)" in millions of news articles in twenty-two languages, as part of our more general effort to recognise entities and their variants in news text and to use them for the automatic analysis of the news, including the linking of related news across languages. We show how the acronym recognition patterns, initially developed for medical terms, needed to be adapted to the more general news domain and we present evaluation results. We describe our effort to automatically merge the numerous long-form variants referring to the same short-form, while keeping non-related long-forms separate. Finally, we provide extensive statistics on the frequency and the distribution of short-form/long-form pairs across languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes