CL AIAug 30, 2023

Grandma Karl is 27 years old -- research agenda for pseudonymization of research data

Elena Volodina, Simon Dobnik, Therese Lindström Tiedemann, Xuan-Son Vu

arXiv:2308.16109v10.54 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This addresses the need for secure open access to research data in fields dealing with personal information, but it is incremental as it builds on existing GDPR suggestions and calls for further studies rather than presenting new results.

The paper tackles the problem of sharing sensitive textual research data by proposing pseudonymization as a solution under GDPR, and outlines a research agenda to study its effects on unstructured data, readability, language assessment, and identity protection, with a focus on developing context-sensitive algorithms.

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions. General Data Protection Regulation (GDPR) suggests pseudonymization as a solution to secure open access to research data, but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data. This paper outlines a research agenda within pseudonymization, namely need of studies into the effects of pseudonymization on unstructured data in relation to e.g. readability and language assessment, as well as the effectiveness of pseudonymization as a way of protecting writer identity, while also exploring different ways of developing context-sensitive algorithms for detection, labelling and replacement of personal information in unstructured data. The recently granted project on pseudonymization Grandma Karl is 27 years old addresses exactly those challenges.

View on arXiv PDF

Similar