Gap-Measure Tests with Applications to Data Integrity Verification
This work addresses data integrity verification, particularly for Big Data analytics, by providing a more sensitive alternative to chi-square tests, though it appears incremental as it builds on existing statistical testing methods.
The paper tackles the problem of assessing uniform distribution hypotheses by proposing gap statistics, specifically a max-gap test, which shows greater sensitivity than chi-square tests in data integrity verification, allowing detection of a larger class of deviations from uniformity.
In this paper we propose and examine gap statistics for assessing uniform distribution hypotheses. We provide examples relevant to data integrity testing for which max-gap statistics provide greater sensitivity than chi-square ($χ^2$), thus allowing the new test to be used in place of or as a complement to $χ^2$ testing for purposes of distinguishing a larger class of deviations from uniformity. We establish that the proposed max-gap test has the same sequential and parallel computational complexity as $χ^2$ and thus is applicable for Big Data analytics and integrity verification.