CLNov 21, 2021

Capitalization and Punctuation Restoration: a Survey

arXiv:2111.10746v123 citations
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers and practitioners working on text pre-processing in NLP, but it is incremental as a survey paper.

This survey reviews historical and state-of-the-art techniques for restoring punctuation and capitalization in text, addressing challenges in sources like speech recognition outputs and social media.

Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes