Linear Regression with Sparsely Permuted Data
This addresses a practical issue in regression analysis for applications with data mismatches, but it is incremental as it builds on prior work by focusing on sparse permutations.
The paper tackles the problem of linear regression when some response-predictor pairs are mismatched (sparsely permuted data), which causes standard estimators like least squares to be inconsistent. It proposes using robust regression to estimate parameters and recover the permutation, offering a computationally simple solution.
In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.