Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
For researchers in compressed sensing and sparse recovery, this work provides the first theoretical conditions for mixed-quality data, revealing a fundamental gap between information-theoretic and algorithmic thresholds.
This paper establishes sample-size conditions for sparse recovery when data comes from mixed-quality sources (high- and low-variance noise). It finds that information-theoretic recovery requires a linear trade-off where one high-quality sample is worth at most two low-quality samples in the agnostic setting, while the LASSO algorithm's recovery threshold depends only on the average noise level, showing robustness to heterogeneity.
We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.