CVLGJun 19, 2020

Hyperparameter Analysis for Image Captioning

arXiv:2006.10923v13 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental analysis for researchers in computer vision and NLP, focusing on optimizing image captioning models.

The paper tackled hyperparameter sensitivity in image captioning by analyzing CNN+LSTM and CNN+Transformer architectures on the Flickr8k dataset, finding that fine-tuning the CNN encoder outperformed baselines and other experiments.

In this paper, we perform a thorough sensitivity analysis on state-of-the-art image captioning approaches using two different architectures: CNN+LSTM and CNN+Transformer. Experiments were carried out using the Flickr8k dataset. The biggest takeaway from the experiments is that fine-tuning the CNN encoder outperforms the baseline and all other experiments carried out for both architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes