Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders
This work provides an unsupervised method for summarizing chat logs, which is beneficial for users needing to quickly understand long conversations, particularly in customer service environments.
This paper addresses the challenge of unsupervised chat summarization, which is difficult due to fragmented topics and context-dependent language in chat logs. The proposed RankAE framework, using a topic-oriented ranking strategy and a context-aware auto-encoder, significantly outperforms other unsupervised methods in generating high-quality summaries.
Automatic chat summarization can help people quickly grasp important information from numerous chat messages. Unlike conventional documents, chat logs usually have fragmented and evolving topics. In addition, these logs contain a quantity of elliptical and interrogative sentences, which make the chat summarization highly context dependent. In this work, we propose a novel unsupervised framework called RankAE to perform chat summarization without employing manually labeled data. RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously, as well as a denoising auto-encoder that is carefully designed to generate succinct but context-informative summaries based on the selected utterances. To evaluate the proposed method, we collect a large-scale dataset of chat logs from a customer service environment and build an annotated set only for model evaluation. Experimental results show that RankAE significantly outperforms other unsupervised methods and is able to generate high-quality summaries in terms of relevance and topic coverage.