Decouple-and-Sample: Protecting sensitive information in task agnostic data release
This addresses privacy issues in data sharing for computer vision applications, offering a task-agnostic solution that is incremental over existing sanitization techniques.
The paper tackles the problem of privacy concerns in dataset release by proposing a two-stage framework that decouples and privately synthesizes sensitive information, achieving a better privacy-utility trade-off and outperforming state-of-the-art baselines on benchmark tasks.
We propose sanitizer, a framework for secure and task-agnostic data release. While releasing datasets continues to make a big impact in various applications of computer vision, its impact is mostly realized when data sharing is not inhibited by privacy concerns. We alleviate these concerns by sanitizing datasets in a two-stage process. First, we introduce a global decoupling stage for decomposing raw data into sensitive and non-sensitive latent representations. Secondly, we design a local sampling stage to synthetically generate sensitive information with differential privacy and merge it with non-sensitive latent features to create a useful representation while preserving the privacy. This newly formed latent information is a task-agnostic representation of the original dataset with anonymized sensitive information. While most algorithms sanitize data in a task-dependent manner, a few task-agnostic sanitization techniques sanitize data by censoring sensitive information. In this work, we show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately. We validate the effectiveness of the sanitizer by outperforming state-of-the-art baselines on the existing benchmark tasks and demonstrating tasks that are not possible using existing techniques.