CR CVNov 20, 2024

Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors

arXiv:2411.13047v22.31 citationsh-index: 5ECML/PKDD

Originality Incremental advance

AI Analysis

This addresses the problem of protecting proprietary object detection models from unauthorized duplication for cloud service providers, representing an incremental improvement in watermarking techniques.

The paper tackles model extraction attacks on object detectors by proposing a bounding-box watermarking method that modifies bounding boxes in API responses to embed a backdoor, achieving 100% accuracy in identifying extracted models across three datasets.

Deep neural networks (DNNs) deployed in a cloud often allow users to query models via the APIs. However, these APIs expose the models to model extraction attacks (MEAs). In this attack, the attacker attempts to duplicate the target model by abusing the responses from the API. Backdoor-based DNN watermarking is known as a promising defense against MEAs, wherein the defender injects a backdoor into extracted models via API responses. The backdoor is used as a watermark of the model; if a suspicious model has the watermark (i.e., backdoor), it is verified as an extracted model. This work focuses on object detection (OD) models. Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model. Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability. In our experiments on three OD datasets, the proposed approach succeeded in identifying the extracted models with 100% accuracy in a wide variety of experimental scenarios.

View on arXiv PDF

Similar