Features in Extractive Supervised Single-document Summarization: Case of Persian News
This work addresses the dependency on document context in summarization, which is often overlooked, specifically for Persian news, representing an incremental improvement.
The paper tackles the challenge of extractive single-document summarization for Persian news by integrating document-level features into sentence vectors to improve context awareness and ranking precision, resulting in more comprehensive and brief summaries.
Text summarization has been one of the most challenging areas of research in NLP. Much effort has been made to overcome this challenge by using either the abstractive or extractive methods. Extractive methods are more popular, due to their simplicity compared with the more elaborate abstractive methods. In extractive approaches, the system will not generate sentences. Instead, it learns how to score sentences within the text by using some textual features and subsequently selecting those with the highest-rank. Therefore, the core objective is ranking and it highly depends on the document. This dependency has been unnoticed by many state-of-the-art solutions. In this work, the features of the document are integrated into vectors of every sentence. In this way, the system becomes informed about the context, increases the precision of the learned model and consequently produces comprehensive and brief summaries.