CVJun 28, 2022Code
ZoDIAC: Zoneout Dropout Injection Attention CalculationZanyar Zohourianshahzadi, Terrance E. Boult, Jugal K. Kalita
In the past few years the transformer model has been utilized for a variety of tasks such as image captioning, image classification natural language generation, and natural language understanding. As a key component of the transformer model, self-attention calculates the attention values by mapping the relationships among the head elements of the source and target sequence, yet there is no explicit mechanism to refine and intensify the attention values with respect to the context of the input and target sequences. Based on this intuition, we introduce a novel refine and intensify attention mechanism that is called Zoneup Dropout Injection Attention Calculation (ZoDIAC), in which the intensities of attention values in the elements of the input source and target sequences are first refined using GELU and dropout and then intensified using a proposed zoneup process which includes the injection of a learned scalar factor. Our extensive experiments show that ZoDIAC achieves statistically significant higher scores under all image captioning metrics using various feature extractors in comparison to the conventional self-attention module in the transformer model on the MS-COCO dataset. Our proposed ZoDIAC attention modules can be used as a drop-in replacement for the attention components in all transformer models. The code for our experiments is publicly available at: https://github.com/zanyarz/zodiac
CVNov 29, 2021
Neural Attention for Image Captioning: Review of Outstanding MethodsZanyar Zohourianshahzadi, Jugal K. Kalita
Image captioning is the task of automatically generating sentences that describe an input image in the best way possible. The most successful techniques for automatically generating image captions have recently used attentive deep learning models. There are variations in the way deep learning models with attention are designed. In this survey, we provide a review of literature related to attentive deep learning models for image captioning. Instead of offering a comprehensive review of all prior work on deep image captioning models, we explain various types of attention mechanisms used for the task of image captioning in deep learning models. The most successful deep learning models used for image captioning follow the encoder-decoder architecture, although there are differences in the way these models employ attention mechanisms. Via analysis on performance results from different attentive deep models for image captioning, we aim at finding the most successful types of attention mechanisms in deep models for image captioning. Soft attention, bottom-up attention, and multi-head attention are the types of attention mechanism widely used in state-of-the-art attentive deep learning models for image captioning. At the current time, the best results are achieved from variants of multi-head attention with bottom-up attention.
CVAug 5, 2021
Neural Twins Talk & Alternative CalculationsZanyar Zohourianshahzadi, Jugal K. Kalita
Inspired by how the human brain employs a higher number of neural pathways when describing a highly focused subject, we show that deep attentive models used for the main vision-language task of image captioning, could be extended to achieve better performance. Image captioning bridges a gap between computer vision and natural language processing. Automated image captioning is used as a tool to eliminate the need for human agent for creating descriptive captions for unseen images.Automated image captioning is challenging and yet interesting. One reason is that AI based systems capable of generating sentences that describe an input image could be used in a wide variety of tasks beyond generating captions for unseen images found on web or uploaded to social media. For example, in biology and medical sciences, these systems could provide researchers and physicians with a brief linguistic description of relevant images, potentially expediting their work.
CLJul 27, 2018
A Survey of the Usages of Deep Learning in Natural Language ProcessingDaniel W. Otter, Julian R. Medina, Jugal K. Kalita
Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This survey provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to a number of applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.