An Equal-Probability Partition of the Sample Space: A Non-parametric Inference from Finite Samples
This provides a non-parametric framework for robust inference from finite samples, applicable across domains where distribution shape is unknown.
The paper tackles the problem of inferring properties of an arbitrary continuous probability distribution from a finite sample, showing that N sorted sample points partition the real line into N+1 segments each with expected probability mass of exactly 1/(N+1). This yields a discrete entropy of log2(N+1) bits, quantifying information gain from the sample.
This paper investigates what can be inferred about an arbitrary continuous probability distribution from a finite sample of $N$ observations drawn from it. The central finding is that the $N$ sorted sample points partition the real line into $N+1$ segments, each carrying an expected probability mass of exactly $1/(N+1)$. This non-parametric result, which follows from fundamental properties of order statistics, holds regardless of the underlying distribution's shape. This equal-probability partition yields a discrete entropy of $\log_2(N+1)$ bits, which quantifies the information gained from the sample and contrasts with Shannon's results for continuous variables. I compare this partition-based framework to the conventional ECDF and discuss its implications for robust non-parametric inference, particularly in density and tail estimation.