LGDec 3, 2024

Learning from Concealed Labels

arXiv:2412.02230v11 citationsh-index: 5MM
Originality Incremental advance
AI Analysis

This addresses privacy concerns in data annotation for sensitive information like disease or smoking, offering a method to classify both sensitive and insensitive labels without exposing sensitive data, though it appears incremental as it builds on existing privacy-preserving techniques.

The paper tackles the problem of protecting privacy when annotating sensitive labels by proposing a learning from concealed labels setting for multi-class classification, where an unbiased estimator is established and the classifier achieves optimal parametric convergence rates in experiments.

Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data. In this paper, an unbiased estimator can be established from concealed data under mild assumptions, and the learned multi-class classifier can not only classify the instance from insensitive labels accurately but also recognize the instance from the sensitive labels. Moreover, we bound the estimation error and show that the multi-class classifier achieves the optimal parametric convergence rate. Experiments demonstrate the significance and effectiveness of the proposed method for concealed labels in synthetic and real-world datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes