LG NE NCApr 27, 2022

Can deep learning match the efficiency of human visual long-term memory in storing object details?

arXiv:2204.13061v3h-index: 5

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in AI for cognitive science and machine learning, showing incremental insights into memory efficiency gaps.

The paper tackled whether deep learning via gradient descent can match human visual long-term memory efficiency in storing object details after a single exposure, finding that models require about 10 exposures to achieve the recognition performance humans reach with only one exposure.

Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be a feasible strategy to reach human-level memory efficiency.

View on arXiv PDF

Similar