Segmentation of high dimensional means over multi-dimensional change points and connections to regression trees
This work provides a novel analytical foundation for regression trees, potentially benefiting statistical modeling in fields like astronomy and image processing, though it appears incremental as it extends existing change point methods to multi-dimensional settings.
The paper tackles the problem of characterizing and implementing regression trees with multivariate responses by proposing a new frequentist framework for high-dimensional mean segmentation over multi-dimensional change points, achieving an optimal convergence rate and enabling asymptotically valid confidence intervals for change point locations under high-dimensional scaling conditions.
This article is motivated by the objective of providing a new analytically tractable and fully frequentist framework to characterize and implement regression trees while also allowing a multivariate (potentially high dimensional) response. The connection to regression trees is made by a high dimensional model with dynamic mean vectors over multi-dimensional change axes. Our theoretical analysis is carried out under a single two dimensional change point setting. An optimal rate of convergence of the proposed estimator is obtained, which in turn allows existence of limiting distributions. Distributional behavior of change point estimates are split into two distinct regimes, the limiting distributions under each regime is then characterized, in turn allowing construction of asymptotically valid confidence intervals for $2d$-location of change. All results are obtained under a high dimensional scaling $s\log^2 p=o(T_wT_h),$ where $p$ is the response dimension, $s$ is a sparsity parameter, and $T_w,T_h$ are sampling periods along change axes. We characterize full regression trees by defining a multiple multi-dimensional change point model. Natural extensions of the single $2d$-change point estimation methodology are provided. Two applications, first on segmentation of {\it Infra-red astronomy satellite (IRAS)} data and second to segmentation of digital images are provided. Methodology and theoretical results are supported with monte-carlo simulations.