Single-round Self-supervised Distributed Learning using Vision Transformer
This work addresses privacy and efficiency issues in distributed medical imaging, offering a task-agnostic foundation model, though it appears incremental as it builds on existing self-supervised and transformer techniques.
The paper tackles the challenges of data scarcity, privacy, and communication overhead in medical deep learning by proposing a self-supervised masked sampling distillation method for vision transformers, achieving superior performance compared to existing distributed learning strategies and fine-tuning baselines.
Despite the recent success of deep learning in the field of medicine, the issue of data scarcity is exacerbated by concerns about privacy and data ownership. Distributed learning approaches, including federated learning, have been investigated to address these issues. However, they are hindered by the need for cumbersome communication overheads and weaknesses in privacy protection. To tackle these challenges, we propose a self-supervised masked sampling distillation method for the vision transformer. This method can be implemented without continuous communication and can enhance privacy by utilizing a vision transformer-specific encryption technique. We conducted extensive experiments on two different tasks, which demonstrated the effectiveness of our method. We achieved superior performance compared to the existing distributed learning strategy as well as the fine-tuning only baseline. Furthermore, since the self-supervised model created using our proposed method can achieve a general semantic understanding of the image, we demonstrate its potential as a task-agnostic self-supervised foundation model for various downstream tasks, thereby expanding its applicability in the medical domain.