CLCVLGNov 22, 2019

TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning

arXiv:1911.10115v114 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generating better image captions for AI applications, though it appears incremental as it builds on existing scene-graph and neural-symbolic methods.

The paper tackles image captioning by introducing a neural-symbolic tensor product representation to embed scene-graph relationships, resulting in architectures that outperform other models and generate more comprehensive and natural captions.

Image captioning can be improved if the structure of the graphical representations can be formulated with conceptual positional binding. In this work, we have introduced a novel technique for caption generation using the neural-symbolic encoding of the scene-graphs, derived from regional visual information of the images and we call it Tensor Product Scene-Graph-Triplet Representation (TP$_{sgt}$R). While, most of the previous works concentrated on identification of the object features in images, we introduce a neuro-symbolic embedding that can embed identified relationships among different regions of the image into concrete forms, instead of relying on the model to compose for any/all combinations. These neural symbolic representation helps in better definition of the neural symbolic space for neuro-symbolic attention and can be transformed to better captions. With this approach, we introduced two novel architectures (TP$_{sgt}$R-TDBU and TP$_{sgt}$R-sTDBU) for comparison and experiment result demonstrates that our approaches outperformed the other models, and generated captions are more comprehensive and natural.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes