Confidentiality and linked data
This addresses privacy risks for data providers like government agencies when sharing linked datasets, but is incremental as it reviews existing principles and methods.
This article examines the challenge of balancing information publication with privacy protection when linking identified administrative datasets across sources and time, focusing on confidentiality risks from data outputs and micro-data release.
Data providers such as government statistical agencies perform a balancing act: maximising information published to inform decision-making and research, while simultaneously protecting privacy. The emergence of identified administrative datasets with the potential for sharing (and thus linking) offers huge potential benefits but significant additional risks. This article introduces the principles and methods of linking data across different sources and points in time, focusing on potential areas of risk. We then consider confidentiality risk, focusing in particular on the "intruder" problem central to the area, and looking at both risks from data producer outputs and from the release of micro-data for further analysis. Finally, we briefly consider potential solutions to micro-data release, both the statistical solutions considered in other contributed articles and non-statistical solutions.