AICVLGSCJun 26, 2023

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

arXiv:2306.14650v21 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses visual reasoning problems for AI and cognitive science, but it appears incremental as it builds on existing Transformer and ResNet methods.

The research tackled the role of attention and memory in visual reasoning by extending Transformer-based self-attention with memory and integrating it with ResNet50, achieving efficient solving of challenging tasks and proposing GAMR, which outperforms other architectures in sample efficiency, robustness, and compositionality with zero-shot generalization.

We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes