Image captioning for Brazilian Portuguese using GRIT model
This work addresses the need for image captioning tools in Brazilian Portuguese, but it is incremental as it applies an existing method to a new language dataset.
The authors tackled the problem of generating image captions in Brazilian Portuguese by adapting the GRIT model, a Transformer-based architecture, to train on a Brazilian Portuguese dataset, resulting in an early development of a model for this language.
This work presents the early development of a model of image captioning for the Brazilian Portuguese language. We used the GRIT (Grid - and Region-based Image captioning Transformer) model to accomplish this work. GRIT is a Transformer-only neural architecture that effectively utilizes two visual features to generate better captions. The GRIT method emerged as a proposal to be a more efficient way to generate image captioning. In this work, we adapt the GRIT model to be trained in a Brazilian Portuguese dataset to have an image captioning method for the Brazilian Portuguese Language.