LGCVJul 3, 2025

Adopting a human developmental visual diet yields robust, shape-based AI vision

arXiv:2507.03168v15 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses the problem of robust and human-like AI vision for applications requiring safety and efficiency, offering a resource-efficient alternative to scaling up data and parameters.

The paper tackled the misalignment between artificial and human vision by introducing a human-inspired developmental visual diet (DVD) for AI, resulting in models that achieved the strongest reported reliance on shape information, outperformed high-parameter foundation models in abstract shape recognition, robustness to corruptions, and resilience to adversarial attacks.

Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes