CLJan 18, 2021

Automatic punctuation restoration with BERT models

arXiv:2101.07343v128 citations
Originality Synthesis-oriented
AI Analysis

This work addresses punctuation restoration for English and Hungarian, presenting an incremental improvement with specific performance gains.

The paper tackled automatic punctuation restoration for English and Hungarian using BERT models, achieving macro-averaged F1-scores of 79.8 on the Ted Talks benchmark and 82.2 on the Szeged Treebank dataset.

We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged $F_1$-score of 79.8 in English and 82.2 in Hungarian. Our code is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes