MEJul 19, 2021
Inference for Change Points in High Dimensional Mean Shift ModelsAbhishek Kaul, George Michailidis
We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $\log p,$ while the component-wise one is optimal. These results enable existence of limiting distributions. Component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions for any finite subset of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. The combined results are used to construct asymptotically valid component-wise and simultaneous confidence intervals for the change point parameters. The results are established under a high dimensional scaling, allowing for diminishing jump sizes, in the presence of diverging number of change points and under subexponential errors. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition.
MEMay 20, 2021
Segmentation of high dimensional means over multi-dimensional change points and connections to regression treesAbhishek Kaul
This article is motivated by the objective of providing a new analytically tractable and fully frequentist framework to characterize and implement regression trees while also allowing a multivariate (potentially high dimensional) response. The connection to regression trees is made by a high dimensional model with dynamic mean vectors over multi-dimensional change axes. Our theoretical analysis is carried out under a single two dimensional change point setting. An optimal rate of convergence of the proposed estimator is obtained, which in turn allows existence of limiting distributions. Distributional behavior of change point estimates are split into two distinct regimes, the limiting distributions under each regime is then characterized, in turn allowing construction of asymptotically valid confidence intervals for $2d$-location of change. All results are obtained under a high dimensional scaling $s\log^2 p=o(T_wT_h),$ where $p$ is the response dimension, $s$ is a sparsity parameter, and $T_w,T_h$ are sampling periods along change axes. We characterize full regression trees by defining a multiple multi-dimensional change point model. Natural extensions of the single $2d$-change point estimation methodology are provided. Two applications, first on segmentation of {\it Infra-red astronomy satellite (IRAS)} data and second to segmentation of digital images are provided. Methodology and theoretical results are supported with monte-carlo simulations.
MEJul 3, 2020
Inference on the change point in high dimensional time series models via plug in least squaresAbhishek Kaul, Stergios B. Fotopoulos, Venkata K. Jandhyala et al.
We study a plug in least squares estimator for the change point parameter where change is in the mean of a high dimensional random vector under subgaussian or subexponential distributions. We obtain sufficient conditions under which this estimator possesses sufficient adaptivity against plug in estimates of mean parameters in order to yield an optimal rate of convergence $O_p(ξ^{-2})$ in the integer scale. This rate is preserved while allowing high dimensionality as well as a potentially diminishing jump size $ξ,$ provided $s\log (p\vee T)=o(\surd(Tl_T))$ or $s\log^{3/2}(p\vee T)=o(\surd(Tl_T))$ in the subgaussian and subexponential cases, respectively. Here $s,p,T$ and $l_T$ represent a sparsity parameter, model dimension, sampling period and the separation of the change point from its parametric boundary. Moreover, since the rate of convergence is free of $s,p$ and logarithmic terms of $T,$ it allows the existence of limiting distributions. These distributions are then derived as the {\it argmax} of a two sided negative drift Brownian motion or a two sided negative drift random walk under vanishing and non-vanishing jump size regimes, respectively. Thereby allowing inference of the change point parameter in the high dimensional setting. Feasible algorithms for implementation of the proposed methodology are provided. Theoretical results are supported with monte-carlo simulations.
STMay 19, 2020
Inference on the Change Point for High Dimensional Dynamic Graphical ModelsAbhishek Kaul, Hongjin Zhang, Konstantinos Tsampourakis et al.
We develop an estimator for the change point parameter for a dynamically evolving graphical model, and also obtain its asymptotic distribution under high dimensional scaling. To procure the latter result, we establish that the proposed estimator exhibits an $O_p(ψ^{-2})$ rate of convergence, wherein $ψ$ represents the jump size between the graphical model parameters before and after the change point. Further, it retains sufficient adaptivity against plug-in estimates of the graphical model parameters. We characterize the forms of the asymptotic distribution under the both a vanishing and a non-vanishing regime of the magnitude of the jump size. Specifically, in the former case it corresponds to the argmax of a negative drift asymmetric two sided Brownian motion, while in the latter case to the argmax of a negative drift asymmetric two sided random walk, whose increments depend on the distribution of the graphical model. Easy to implement algorithms are provided for estimating the change point and their performance assessed on synthetic data. The proposed methodology is further illustrated on RNA-sequenced microbiome data and their changes between young and older individuals.