Indian Language Summarization using Pretrained Sequence-to-Sequence Models
This work addresses summarization for Indian languages, which is an incremental improvement using existing methods on new data.
The paper tackled text summarization for Hindi, Gujarati, and English by experimenting with pretrained sequence-to-sequence models, achieving first rank across all three languages in the ILSUM shared task.
The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, Hindi and Gujarati). This paper also extensively analyzes the impact of k-fold cross-validation while experimenting with limited data size, and we also perform various experiments with a combination of the original and a filtered version of the data to determine the efficacy of the pretrained models.