CYAICRLGJul 7, 2022

Privacy-Preserving Synthetic Educational Data Generation

arXiv:2207.03202v113 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses privacy concerns for educational institutions and researchers by enabling safe data sharing, though it is incremental as it builds on existing synthetic data generation methods.

The paper tackles the problem of generating synthetic educational data that preserves participant privacy, addressing re-identification risks from naive pseudonymization, and presents an evaluation framework for comparing synthetic data generators, with evaluation on existing massive educational open datasets.

Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes