MLLGMEJul 24, 2020

CD-split and HPD-split: efficient conformal regions in high dimensions

arXiv:2007.12778v386 citations
Originality Incremental advance
AI Analysis

This work addresses uncertainty representation for high-dimensional data, offering improved methods for practitioners, though it is incremental as it builds on existing split methods.

The paper tackles the problem of creating efficient prediction regions in high dimensions using conformal methods, introducing CD-split and HPD-split, which asymptotically converge to oracle highest predictive density sets and show better conditional coverage and smaller regions in simulations.

Conformal methods create prediction bands that control average coverage assuming solely i.i.d. data. Although the literature has mostly focused on prediction intervals, more general regions can often better represent uncertainty. For instance, a bimodal target is better represented by the union of two intervals. Such prediction regions are obtained by CD-split , which combines the split method and a data-driven partition of the feature space which scales to high dimensions. CD-split however contains many tuning parameters, and their role is not clear. In this paper, we provide new insights on CD-split by exploring its theoretical properties. In particular, we show that CD-split converges asymptotically to the oracle highest predictive density set and satisfies local and asymptotic conditional validity. We also present simulations that show how to tune CD-split. Finally, we introduce HPD-split, a variation of CD-split that requires less tuning, and show that it shares the same theoretical guarantees as CD-split. In a wide variety of our simulations, CD-split and HPD-split have better conditional coverage and yield smaller prediction regions than other methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes