LGAICRJan 29, 2025

Topological Signatures of Adversaries in Multimodal Alignments

arXiv:2501.18006v12 citationsh-index: 25ICML
Originality Incremental advance
AI Analysis

This addresses the underexplored issue of adversarial robustness in multimodal machine learning systems, which is incremental as it builds on existing unimodal defenses.

The paper tackles the problem of adversarial attacks on multimodal systems like CLIP/BLIP by investigating topological signatures in image-text alignments, showing that adversarial perturbations disrupt these alignments and introducing novel topological-contrastive losses and an algorithm for improved detection.

Multimodal Machine Learning systems, particularly those aligning text and image data like CLIP/BLIP models, have become increasingly prevalent, yet remain susceptible to adversarial attacks. While substantial research has addressed adversarial robustness in unimodal contexts, defense strategies for multimodal systems are underexplored. This work investigates the topological signatures that arise between image and text embeddings and shows how adversarial attacks disrupt their alignment, introducing distinctive signatures. We specifically leverage persistent homology and introduce two novel Topological-Contrastive losses based on Total Persistence and Multi-scale kernel methods to analyze the topological signatures introduced by adversarial perturbations. We observe a pattern of monotonic changes in the proposed topological losses emerging in a wide range of attacks on image-text alignments, as more adversarial samples are introduced in the data. By designing an algorithm to back-propagate these signatures to input samples, we are able to integrate these signatures into Maximum Mean Discrepancy tests, creating a novel class of tests that leverage topological signatures for better adversarial detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes