CL LGJan 2, 2025

BeliN: A Novel Corpus for Bengali Religious News Headline Generation using Contextual Feature Fusion

Md Osama, Ashim Dey, Kawsar Ahmed, Muhammad Ashad Kabir

arXiv:2501.01069v12.7h-index: 2Has CodeNat Lang Process J

Originality Incremental advance

AI Analysis

This research addresses headline generation for Bengali religious news, an underexplored area, by incorporating contextual features to improve performance for low-resource languages.

The study tackled the problem of Bengali religious news headline generation by introducing a new corpus and a contextual feature fusion approach, achieving BLEU and ROUGE-L scores of 18.61 and 24.19, outperforming a baseline with scores of 16.08 and 23.08.

Automatic text summarization, particularly headline generation, remains a critical yet underexplored area for Bengali religious news. Existing approaches to headline generation typically rely solely on the article content, overlooking crucial contextual features such as sentiment, category, and aspect. This limitation significantly hinders their effectiveness and overall performance. This study addresses this limitation by introducing a novel corpus, BeliN (Bengali Religious News) - comprising religious news articles from prominent Bangladeshi online newspapers, and MultiGen - a contextual multi-input feature fusion headline generation approach. Leveraging transformer-based pre-trained language models such as BanglaT5, mBART, mT5, and mT0, MultiGen integrates additional contextual features - including category, aspect, and sentiment - with the news content. This fusion enables the model to capture critical contextual information often overlooked by traditional methods. Experimental results demonstrate the superiority of MultiGen over the baseline approach that uses only news content, achieving a BLEU score of 18.61 and ROUGE-L score of 24.19, compared to baseline approach scores of 16.08 and 23.08, respectively. These findings underscore the importance of incorporating contextual features in headline generation for low-resource languages. By bridging linguistic and cultural gaps, this research advances natural language processing for Bengali and other underrepresented languages. To promote reproducibility and further exploration, the dataset and implementation code are publicly accessible at https://github.com/akabircs/BeliN.

View on arXiv PDF Code

Similar