CVDec 12, 2023

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

arXiv:2312.07418v3h-index: 11

Originality Synthesis-oriented

AI Analysis

This addresses video captioning for Nepali speakers, but it is incremental as it applies existing methods to a new language and dataset.

The paper tackles video captioning in Nepali, a language with limited prior work, by developing an encoder-decoder model using LSTM/GRU and CNNs, achieving performance measured by BLEU, METOR, and ROUGE metrics.

Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain. This work develops a novel encoder-decoder paradigm for Nepali video captioning to tackle this difficulty. LSTM and GRU sequence-to-sequence models are used in the model to produce related textual descriptions based on features retrieved from video frames using CNNs. Using Google Translate and manual post-editing, a Nepali video captioning dataset is generated from the Microsoft Research Video Description Corpus (MSVD) dataset created using Google Translate, and manual post-editing work. The efficiency of the model for Devanagari-scripted video captioning is demonstrated by BLEU, METOR, and ROUGE measures, which are used to assess its performance.

View on arXiv PDF

Similar