CLJul 8, 2022

ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Prateek Kacker, Andi Cupallari, Aswin Gridhar Subramanian, Nimit Jain

arXiv:2207.04008v1580 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of handling ambiguous language in text across domains like healthcare, where existing spelling correction models fail due to character reductions, though it appears incremental as it adapts BERT to a specific task.

The paper tackles the problem of disambiguating abbreviations and contractions in text, such as in doctors' notes, by proposing ABB-BERT, a BERT-based model that ranks expansions from thousands of options and is designed for scalability, with training on Wikipedia data and fine-tuning capability for specific domains.

Abbreviations and contractions are commonly found in text across different domains. For example, doctors' notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.

View on arXiv PDF

Similar