MLMay 3, 2017

Linear Regression with Shuffled Labels

arXiv:1705.01342v268 citations
Originality Incremental advance
AI Analysis

This addresses a practical problem for researchers in fields like flow cytometry, where label ordering may be lost, offering a robust inference framework, though it is incremental as it builds on linear regression with a novel twist.

The paper tackles the problem of performing linear regression when labels are shuffled relative to inputs, proposing estimators that recover weights from noisy linear models with unknown permutations. It shows that least-squares fails in this setting and introduces a self-moments-based estimator, demonstrating recovery of approximate weights on synthetic and standard datasets.

Is it possible to perform linear regression on datasets whose labels are shuffled with respect to the inputs? We explore this question by proposing several estimators that recover the weights of a noisy linear model from labels that are shuffled by an unknown permutation. We show that the analog of the classical least-squares estimator produces inconsistent estimates in this setting, and introduce an estimator based on the self-moments of the input features and labels. We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently. The result is a framework that enables robust inference, as we demonstrate by experiments on both synthetic and standard datasets, where we are able to recover approximate weights using only shuffled labels. Our work demonstrates that linear regression in the absence of complete ordering information is possible and can be of practical interest, particularly in experiments that characterize populations of particles, such as flow cytometry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes