Privacy-Preserving Methods for Vertically Partitioned Incomplete Data
This work provides a privacy-preserving solution for analyzing incomplete, vertically partitioned health data, which is crucial for fostering collaboration among institutions in distributed health data networks.
This paper addresses the challenge of missing data in distributed health networks where data cannot be centrally pooled due to privacy concerns. The authors propose a privacy-preserving distributed analysis framework that allows institutions to compute local aggregated statistics, which are then shared to construct a global model for handling missing data. Their simulation studies demonstrate that the proposed methods perform as well as pooled data methods and outperform naive approaches.
Distributed health data networks that use information from multiple sources have drawn substantial interest in recent years. However, missing data are prevalent in such networks and present significant analytical challenges. The current state-of-the-art methods for handling missing data require pooling data into a central repository before analysis, which may not be possible in a distributed health data network. In this paper, we propose a privacy-preserving distributed analysis framework for handling missing data when data are vertically partitioned. In this framework, each institution with a particular data source utilizes the local private data to calculate necessary intermediate aggregated statistics, which are then shared to build a global model for handling missing data. To evaluate our proposed methods, we conduct simulation studies that clearly demonstrate that the proposed privacy-preserving methods perform as well as the methods using the pooled data and outperform several naïve methods. We further illustrate the proposed methods through the analysis of a real dataset. The proposed framework for handling vertically partitioned incomplete data is substantially more privacy-preserving than methods that require pooling of the data, since no individual-level data are shared, which can lower hurdles for collaboration across multiple institutions and build stronger public trust.