CVAIDec 13, 2024

Automated Image Captioning with CNNs and Transformers

arXiv:2412.10511v12 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This is an incremental approach to generating descriptions for images, potentially benefiting accessibility or content analysis.

The project tackled automated image captioning by integrating computer vision and NLP techniques, but no concrete results or numbers were reported.

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different techniques, ranging from CNN-RNN to the more advanced transformer-based techniques. Training is carried out on image datasets paired with descriptive captions, and model performance will be evaluated using established metrics such as BLEU, METEOR, and CIDEr. The project will also involve experimentation with advanced attention mechanisms, comparisons of different architectural choices, and hyperparameter optimization to refine captioning accuracy and overall system effectiveness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes