CLAug 27, 2018

WiSeBE: Window-based Sentence Boundary Evaluation

arXiv:1808.08850v1
Originality Incremental advance
AI Analysis

This work addresses the evaluation bottleneck for SBD, which is crucial for downstream NLP tasks like summarization, but it is incremental as it focuses on improving metrics rather than the core detection method.

The paper tackles the problem of evaluating Sentence Boundary Detection (SBD) systems by proposing WiSeBE, a semi-supervised metric based on multi-reference agreement, and shows it is more reliable than standard metrics like precision and recall when tested on YouTube transcripts.

Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes