GOAL: Towards Benchmarking Few-Shot Sports Game Summarization
This work addresses a data scarcity problem for researchers in sports game summarization, but it is incremental as it primarily introduces a new dataset.
The authors tackled the lack of English datasets for sports game summarization by releasing GOAL, a dataset with 103 commentary-news pairs and 2,160 unlabeled documents, and found that baseline methods still face challenges in this task.
Sports game summarization aims to generate sports news based on real-time commentaries. The task has attracted wide research attention but is still under-explored probably due to the lack of corresponding English datasets. Therefore, in this paper, we release GOAL, the first English sports game summarization dataset. Specifically, there are 103 commentary-news pairs in GOAL, where the average lengths of commentaries and news are 2724.9 and 476.3 words, respectively. Moreover, to support the research in the semi-supervised setting, GOAL additionally provides 2,160 unlabeled commentary documents. Based on our GOAL, we build and evaluate several baselines, including extractive and abstractive baselines. The experimental results show the challenges of this task still remain. We hope our work could promote the research of sports game summarization. The dataset has been released at https://github.com/krystalan/goal.