CLJul 24, 2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

arXiv:2307.12949v1
Originality Incremental advance
AI Analysis

This addresses readability issues in ASR outputs, though it is incremental as it builds on existing methods with data generation and reinforcement learning.

The paper tackles the problem of punctuation restoration for automatic speech recognition (ASR) texts by bridging the gap between written and ASR data, achieving state-of-the-art performance on two benchmark datasets.

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes