CV AI CLFeb 7, 2024

Image captioning for Brazilian Portuguese using GRIT model

Rafael Silva de Alencar, William Alberto Cruz Castañeda, Marcellus Amadeus

arXiv:2402.05106v13.72 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses the need for image captioning tools in Brazilian Portuguese, but it is incremental as it applies an existing method to a new language dataset.

The authors tackled the problem of generating image captions in Brazilian Portuguese by adapting the GRIT model, a Transformer-based architecture, to train on a Brazilian Portuguese dataset, resulting in an early development of a model for this language.

This work presents the early development of a model of image captioning for the Brazilian Portuguese language. We used the GRIT (Grid - and Region-based Image captioning Transformer) model to accomplish this work. GRIT is a Transformer-only neural architecture that effectively utilizes two visual features to generate better captions. The GRIT method emerged as a proposal to be a more efficient way to generate image captioning. In this work, we adapt the GRIT model to be trained in a Brazilian Portuguese dataset to have an image captioning method for the Brazilian Portuguese Language.

View on arXiv PDF

Similar