CV AIMay 19

Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

Zhenyu Yu, Yangchen Zeng, Chunlei Meng, Guangzhen Yao, Shuigeng Zhou

arXiv:2605.2028225.3

AI Analysis

For researchers in federated unlearning, this work exposes the inadequacy of current certification standards and calls for representation-aware evaluation.

Existing VFL unlearning methods claim forgetting based on output-level metrics, but Mirage reveals they retain class structure in representations (LPR up to 15.4 points above retrained baseline), exposing a forgetting gap and an unlearning trilemma where no method achieves high utility, output-level, and representation-level forgetting simultaneously.

Machine unlearning in Vertical Federated Learning (VFL) has attracted growing interest, yet existing methods certify forgetting solely using output-level metrics. We challenge these claims by introducing Mirage, a representation-level auditing framework comprising four complementary diagnostics: Linear Probe Recovery (LPR), Centered Kernel Alignment (CKA), Feature Separability Scoring, and Layer-Wise Recovery Analysis. Through experiments across seven datasets and seven baseline methods following recent VFL unlearning protocols, Mirage reveals three key findings: (i) Forgetting gap: methods that pass output-level certification still retain substantial class structure in their representations, with LPR exceeding the retrained baseline by up to 15.4 points; CKA shows these models remain structurally closer to the original than to the retrained reference, while separability scores indicate persistent geometric discrimination. (ii) Unlearning trilemma: no existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting. (iii) Class-sample asymmetry: class-level forgetting leaves strong representational traces (LPR up to 97%), whereas sample-level forgetting is indistinguishable from chance (LPR approx. 50%); layer-wise analysis further shows residual class information persists across network depths. These findings call for representation-aware evaluation standards in federated unlearning research.

View on arXiv PDF

Similar