CVAIApr 30, 2021

End-to-End Attention-based Image Captioning

arXiv:2104.14721v11 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating accurate chemical notations from noisy or feature-sparse molecular images, which is incremental as it applies a known transformer approach to a specific domain.

The paper tackles the problem of image captioning for molecular translation to predict chemical notations in InChI format, proposing an end-to-end transformer model that outperforms attention-based techniques on molecular datasets.

In this paper, we address the problem of image captioning specifically for molecular translation where the result would be a predicted chemical notation in InChI format for a given molecular structure. Current approaches mainly follow rule-based or CNN+RNN based methodology. However, they seem to underperform on noisy images and images with small number of distinguishable features. To overcome this, we propose an end-to-end transformer model. When compared to attention-based techniques, our proposed model outperforms on molecular datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes