CV LG NESep 23, 2021

How much human-like visual experience do current self-supervised learning algorithms need in order to achieve human-level object recognition?

arXiv:2109.11523v38.05 citationsHas Code

Originality Incremental advance

AI Analysis

This reveals a fundamental gap in AI visual learning efficiency, highlighting a critical bottleneck for achieving human-like AI in object recognition.

The paper investigates how much human-like visual experience current self-supervised learning algorithms require to achieve human-level object recognition on ImageNet, finding that it would take millions to billions of years, far exceeding a human lifetime.

This paper addresses a fundamental question: how good are our current self-supervised visual representation learning algorithms relative to humans? More concretely, how much "human-like" natural visual experience would these algorithms need in order to reach human-level performance in a complex, realistic visual object recognition task such as ImageNet? Using a scaling experiment, here we estimate that the answer is several orders of magnitude longer than a human lifetime: typically on the order of a million to a billion years of natural visual experience (depending on the algorithm used). We obtain even larger estimates for achieving human-level performance in ImageNet-derived robustness benchmarks. The exact values of these estimates are sensitive to some underlying assumptions, however even in the most optimistic scenarios they remain orders of magnitude larger than a human lifetime. We discuss the main caveats surrounding our estimates and the implications of these surprising results.

View on arXiv PDF Code

Similar