CLOct 12, 2021

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

arXiv:2110.05750v14 citationsHas Code
Originality Incremental advance
AI Analysis

This work improves sports news generation for media and fans, but it is incremental as it builds on an existing framework.

The paper tackles the problem of generating sports news from live text commentaries by addressing noise in the dataset, improving pseudo-labeling with lexical overlap, and enhancing summarization with a reranker, resulting in a model that outperforms the state-of-the-art baseline.

Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes