CVFeb 17, 2025

Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?

Blaine Hoak, Kunyang Li, Patrick McDaniel

arXiv:2502.12377v28.42 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses the problem of improving adversarial robustness in vision models for AI security, offering incremental insights into how human-like perception can enhance model security.

The paper investigates whether machine learning models that align more closely with human vision are more robust to adversarial attacks, finding that while overall correlation is weak, specific alignment benchmarks, particularly those measuring texture or shape selectivity, strongly predict adversarial robustness.

A small but growing body of work has shown that machine learning models which better align with human vision have also exhibited higher robustness to adversarial examples, raising the question: can human-like perception make models more secure? If true generally, such mechanisms would offer new avenues toward robustness. In this work, we conduct a large-scale empirical analysis to systematically investigate the relationship between representational alignment and adversarial robustness. We evaluate 114 models spanning diverse architectures and training paradigms, measuring their neural and behavioral alignment and engineering task performance across 105 benchmarks as well as their adversarial robustness via AutoAttack. Our findings reveal that while average alignment and robustness exhibit a weak overall correlation, specific alignment benchmarks serve as strong predictors of adversarial robustness, particularly those that measure selectivity toward texture or shape. These results suggest that different forms of alignment play distinct roles in model robustness, motivating further investigation into how alignment-driven approaches can be leveraged to build more secure and perceptually-grounded vision models.

View on arXiv PDF

Similar