Privacy-Preserving Self-Taught Federated Learning for Heterogeneous Data
This work addresses privacy and efficiency challenges in federated learning for scenarios with heterogeneous data across multiple participants, representing an incremental improvement over existing methods.
The paper tackles the limitations of existing vertical federated learning methods, such as restrictive neural network structures and slow training, by proposing a self-taught federated learning method that uses unsupervised feature extraction for distributed supervised tasks, achieving improved efficiency and privacy preservation.
Many application scenarios call for training a machine learning model among multiple participants. Federated learning (FL) was proposed to enable joint training of a deep learning model using the local data in each party without revealing the data to others. Among various types of FL methods, vertical FL is a category to handle data sources with the same ID space and different feature spaces. However, existing vertical FL methods suffer from limitations such as restrictive neural network structure, slow training speed, and often lack the ability to take advantage of data with unmatched IDs. In this work, we propose an FL method called self-taught federated learning to address the aforementioned issues, which uses unsupervised feature extraction techniques for distributed supervised deep learning tasks. In this method, only latent variables are transmitted to other parties for model training, while privacy is preserved by storing the data and parameters of activations, weights, and biases locally. Extensive experiments are performed to evaluate and demonstrate the validity and efficiency of the proposed method.