Emily Jefferson

LGNov 3, 2022

GRAIMATTER Green Paper: Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)

Emily Jefferson, James Liley, Maeve Malone et al.

TREs are widely, and increasingly used to support statistical analysis of sensitive data across a range of sectors (e.g., health, police, tax and education) as they enable secure and transparent research whilst protecting data confidentiality. There is an increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These complex AI models require more information to describe and reproduce, increasing the possibility that sensitive personal data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all TREs to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training. GRAIMATTER has developed a draft set of usable recommendations for TREs to guard against the additional risks when disclosing trained AI models from TREs. The development of these recommendations has been funded by the GRAIMATTER UKRI DARE UK sprint research project. This version of our recommendations was published at the end of the project in September 2022. During the course of the project, we have identified many areas for future investigations to expand and test these recommendations in practice. Therefore, we expect that this document will evolve over time.

CRNov 10, 2021

Machine Learning Models Disclosure from Trusted Research Environments (TRE), Challenges and Opportunities

Esma Mansouri-Benssassi, Simon Rogers, Jim Smith et al.

Artificial intelligence (AI) applications in healthcare and medicine have increased in recent years. To enable access to personal data, Trusted Research environments (TREs) provide safe and secure environments in which researchers can access sensitive personal data and develop Artificial Intelligence (AI) and Machine Learning models. However currently few TREs support the use of automated AI-based modelling using Machine Learning. Early attempts have been made in the literature to present and introduce privacy preserving machine learning from the design point of view [1]. However, there exists a gap in the practical decision-making guidance for TREs in handling models disclosure. Specifically, the use of machine learning creates a need to disclose new types of outputs from TREs, such as trained machine learning models. Although TREs have clear policies for the disclosure of statistical outputs, the extent to which trained models can leak personal training data once released is not well understood and guidelines do not exist within TREs for the safe disclosure of these models. In this paper we introduce the challenge of disclosing trained machine learning models from TREs. We first give an overview of machine learning models in general and describe some of their applications in healthcare and medicine. We define the main vulnerabilities of trained machine learning models in general. We also describe the main factors affecting the vulnerabilities of disclosing machine learning models. This paper also provides insights and analyses methods that could be introduced within TREs to mitigate the risk of privacy breaches when disclosing trained models.

Emily Jefferson

2 Papers