CV LGJun 19, 2020

Hyperparameter Analysis for Image Captioning

arXiv:2006.10923v12.33 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This is an incremental analysis for researchers in computer vision and NLP, focusing on optimizing image captioning models.

The paper tackled hyperparameter sensitivity in image captioning by analyzing CNN+LSTM and CNN+Transformer architectures on the Flickr8k dataset, finding that fine-tuning the CNN encoder outperformed baselines and other experiments.

In this paper, we perform a thorough sensitivity analysis on state-of-the-art image captioning approaches using two different architectures: CNN+LSTM and CNN+Transformer. Experiments were carried out using the Flickr8k dataset. The biggest takeaway from the experiments is that fine-tuning the CNN encoder outperforms the baseline and all other experiments carried out for both architectures.

View on arXiv PDF Code

Similar