Computational Implications of Reducing Data to Sufficient Statistics
This addresses a fundamental issue in computational statistics for researchers and practitioners, revealing a counterintuitive computational trade-off.
The paper tackles the problem of data pre-processing by reducing to sufficient statistics, showing that this can transform a computationally tractable estimation problem into an intractable one, with implications for graphical model estimation techniques.
Given a large dataset and an estimation task, it is common to pre-process the data by reducing them to a set of sufficient statistics. This step is often regarded as straightforward and advantageous (in that it simplifies statistical analysis). I show that -on the contrary- reducing data to sufficient statistics can change a computationally tractable estimation problem into an intractable one. I discuss connections with recent work in theoretical computer science, and implications for some techniques to estimate graphical models.