Randomized Controlled Trials without Data Retention
This work is significant for researchers and organizations conducting RCTs who need to comply with strict privacy regulations and data minimization policies, offering a method to maintain inferential accuracy without retaining sensitive individual data.
This paper addresses the challenge of conducting Randomized Controlled Trials (RCTs) while adhering to data minimization principles, where individual records are deleted or anonymized shortly after collection. It proposes recursive algorithms to construct running estimates of treatment effects and combines them with bootstrap and federated strategies to draw robust inferences, even with non-i.i.d. data.
Amidst rising appreciation for privacy and data usage rights, researchers have increasingly acknowledged the principle of data minimization, which holds that the accessibility, collection, and retention of subjects' data should be kept to the bare amount needed to answer focused research questions. Applying this principle to randomized controlled trials (RCTs), this paper presents algorithms for making accurate inferences from RCTs under stringent data retention and anonymization policies. In particular, we show how to use recursive algorithms to construct running estimates of treatment effects in RCTs, which allow individualized records to be deleted or anonymized shortly after collection. Devoting special attention to non-i.i.d. data, we further show how to draw robust inferences from RCTs by combining recursive algorithms with bootstrap and federated strategies.