A log-linear time algorithm for constrained changepoint detection
This work addresses the need for efficient and accurate peak detection in genomic data analysis, representing an incremental improvement by adapting existing techniques to handle constraints.
The paper tackles the problem of constrained changepoint detection in time series and genomic data, such as ChIP-seq, by adapting a functional pruning technique to handle arbitrary affine constraints on segment means, resulting in a log-linear time algorithm that achieves state-of-the-art accuracy and is orders of magnitude faster than existing methods.
Changepoint detection is a central problem in time series and genomic data. For some applications, it is natural to impose constraints on the directions of changes. One example is ChIP-seq data, for which adding an up-down constraint improves peak detection accuracy, but makes the optimization problem more complicated. We show how a recently proposed functional pruning technique can be adapted to solve such constrained changepoint detection problems. This leads to a new algorithm which can solve problems with arbitrary affine constraints on adjacent segment means, and which has empirical time complexity that is log-linear in the amount of data. This algorithm achieves state-of-the-art accuracy in a benchmark of several genomic data sets, and is orders of magnitude faster than existing algorithms that have similar accuracy. Our implementation is available as the PeakSegPDPA function in the coseg R package, https://github.com/tdhock/coseg