Securing Biomedical Images from Unauthorized Training with Anti-Learning Perturbation
This addresses the risk of data exploitation in healthcare, potentially encouraging more institutions to share data, though it is an incremental application of existing anti-learning perturbation methods to a specific domain.
The paper tackles the problem of unauthorized use of biomedical images for training AI models by proposing an 'unlearnable biomedical image' approach that injects imperceptible noises to make data unexploitable, achieving protection with minimal impact on human perception.
The volume of open-source biomedical data has been essential to the development of various spheres of the healthcare community since more `free' data can provide individual researchers more chances to contribute. However, institutions often hesitate to share their data with the public due to the risk of data exploitation by unauthorized third parties for another commercial usage (e.g., training AI models). This phenomenon might hinder the development of the whole healthcare research community. To address this concern, we propose a novel approach termed `unlearnable biomedical image' for protecting biomedical data by injecting imperceptible but delusive noises into the data, making them unexploitable for AI models. We formulate the problem as a bi-level optimization and propose three kinds of anti-learning perturbation generation approaches to solve the problem. Our method is an important step toward encouraging more institutions to contribute their data for the long-term development of the research community.