LGSep 24, 2025

Efficiently Attacking Memorization Scores

arXiv:2509.20463v2h-index: 2Has Code
Originality Highly original
AI Analysis

This work addresses a critical security problem for users of influence estimation tools in data valuation and responsible machine learning, revealing inherent fragilities in these methods.

The paper tackles the vulnerability of memorization-based influence estimators to adversarial manipulation, showing that even state-of-the-art proxies can be attacked with modest computational overhead, as validated across image classification tasks.

Influence estimation tools -- such as memorization scores -- are widely used to understand model behavior, attribute training data, and inform dataset curation. However, recent applications in data valuation and responsible machine learning raise the question: can these scores themselves be adversarially manipulated? In this work, we present a systematic study of the feasibility of attacking memorization-based influence estimators. We characterize attacks for producing highly memorized samples as highly sensitive queries in the regime where a trained algorithm is accurate. Our attack (calculating the pseudoinverse of the input) is practical, requiring only black-box access to model outputs and incur modest computational overhead. We empirically validate our attack across a wide suite of image classification tasks, showing that even state-of-the-art proxies are vulnerable to targeted score manipulations. In addition, we provide a theoretical analysis of the stability of memorization scores under adversarial perturbations, revealing conditions under which influence estimates are inherently fragile. Our findings highlight critical vulnerabilities in influence-based attribution and suggest the need for robust defenses. All code can be found at https://github.com/tuedo2/MemAttack

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes