Template-based Abstractive Microblog Opinion Summarisation
This work addresses the problem of summarizing opinions from microblogs like Twitter for researchers and practitioners, though it is incremental as it builds on existing summarization techniques with a new dataset.
The authors introduced the task of microblog opinion summarisation (MOS) and created a dataset of 3100 gold-standard abstractive summaries from tweets, covering more topics than existing datasets. They benchmarked state-of-the-art models, showing that abstractive models outperform extractive ones, with fine-tuning improving performance.
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarising news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favours extractive summarisation models. To showcase the dataset's utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarisation models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.