CRMay 14, 2024
UnMarker: A Universal Attack on Defensive Image WatermarkingAndre Kassis, Urs Hengartner
Reports regarding the misuse of Generative AI (GenAI) to create deepfakes are frequent. Defensive watermarking enables GenAI providers to hide fingerprints in their images and use them later for deepfake detection. Yet, its potential has not been fully explored. We present UnMarker -- the first practical universal attack on defensive watermarking. Unlike existing attacks, UnMarker requires no detector feedback, no unrealistic knowledge of the watermarking scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, UnMarker employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against SOTA schemes prove UnMarker's effectiveness. It not only defeats traditional schemes while retaining superior quality compared to existing attacks but also breaks semantic watermarks that alter an image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking. Our findings show that defensive watermarking is not a viable defense against deepfakes, and we urge the community to explore alternatives.
CRNov 25, 2024
DiffBreak: Is Diffusion-Based Purification Robust?Andre Kassis, Urs Hengartner, Yaoliang Yu
Diffusion-based purification (DBP) has become a cornerstone defense against adversarial examples (AEs), regarded as robust due to its use of diffusion models (DMs) that project AEs onto the natural data manifold. We refute this core claim, theoretically proving that gradient-based attacks effectively target the DM rather than the classifier, causing DBP's outputs to align with adversarial distributions. This prompts a reassessment of DBP's robustness, attributing it to two critical flaws: incorrect gradients and inappropriate evaluation protocols that test only a single random purification of the AE. We show that with proper accounting for stochasticity and resubmission risk, DBP collapses. To support this, we introduce DiffBreak, the first reliable toolkit for differentiation through DBP, eliminating gradient flaws that previously further inflated robustness estimates. We also analyze the current defense scheme used for DBP where classification relies on a single purification, pinpointing its inherent invalidity. We provide a statistically grounded majority-vote (MV) alternative that aggregates predictions across multiple purified copies, showing partial but meaningful robustness gain. We then propose a novel adaptation of an optimization method against deepfake watermarking, crafting systemic perturbations that defeat DBP even under MV, challenging DBP's viability.
CRJul 30, 2021
Practical Attacks on Voice Spoofing CountermeasuresAndre Kassis, Urs Hengartner
Voice authentication has become an integral part in security-critical operations, such as bank transactions and call center conversations. The vulnerability of automatic speaker verification systems (ASVs) to spoofing attacks instigated the development of countermeasures (CMs), whose task is to tell apart bonafide and spoofed speech. Together, ASVs and CMs form today's voice authentication platforms, advertised as an impregnable access control mechanism. We develop the first practical attack on CMs, and show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form. Previous works have primarily focused on non-proactive attacks or adversarial strategies against ASVs that do not produce speech in the victim's voice. The repercussions of our attacks are far more severe, as the samples we generate sound like the victim, eliminating any chance of plausible deniability. Moreover, the few existing adversarial attacks against CMs mistakenly optimize spoofed speech in the feature space and do not take into account the existence of ASVs, resulting in inferior synthetic audio that fails in realistic settings. We eliminate these obstacles through our key technical contribution: a novel joint loss function that enables mounting advanced adversarial attacks against combined ASV/CM deployments directly in the time domain. Our adversarials achieve concerning black-box success rates against state-of-the-art authentication platforms (up to 93.57\%). Finally, we perform the first targeted, over-telephony-network attack on CMs, bypassing several challenges and enabling various potential threats, given the increased use of voice biometrics in call centers. Our results call into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures to gain access to users' most valuable resources.
CRJul 16, 2020
Deep ahead-of-threat virtual patchingFady Copty, Andre Kassis, Sharon Keidar-Barner et al.
Many applications have security vulnerabilities that can be exploited. It is practically impossible to find all of them due to the NP-complete nature of the testing problem. Security solutions provide defenses against these attacks through continuous application testing, fast-patching of vulnerabilities, automatic deployment of patches, and virtual patching detection techniques deployed in network and endpoint security tools. These techniques are limited by the need to find vulnerabilities before the black-hats. We propose an innovative technique to virtually patch vulnerabilities before they are found. We leverage testing techniques for supervised-learning data generation, and show how artificial intelligence techniques can use this data to create predictive deep neural-network models that read an application's input and predict in real time whether it is a potential malicious input. We set up an ahead-of-threat experiment in which we generated data on old versions of an application, and then evaluated the predictive model accuracy on vulnerabilities found years later. Our experiments show ahead-of-threat detection on LibXML2 and LibTIFF vulnerabilities with 91.3% and 93.7% accuracy, respectively. We expect to continue work on this field of research and provide ahead-of-threat virtual patching for more libraries. Success in this research can change the current state of endless racing after application vulnerabilities and put the defenders one step ahead of the attackers