Knowledge Enhanced Sports Game Summarization
This addresses the issue of generating accurate sports news from live commentaries for sports analysts and fans, but it is incremental as it builds on existing summarization methods with dataset improvements.
The authors tackled the problem of noisy datasets and knowledge gaps in sports game summarization by introducing K-SportsSum, a manually cleaned dataset with 7,854 commentary-news pairs and a knowledge corpus covering 523 teams and 14,724 players, and their knowledge-enhanced model achieved new state-of-the-art performance.
Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.