Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images
This addresses privacy-preserving learning for computer vision applications, but it is incremental as it builds on existing domain adaptation and ViT methods.
The paper tackles the problem of performance degradation when fine-tuning vision transformers with transformed (encrypted) images, proposing a domain adaptation method that prevents accuracy loss on CIFAR-10 and CIFAR-100 datasets.
In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.