MLSTMESep 17, 2014

Distance Shrinkage and Euclidean Embedding via Regularized Kernel Estimation

arXiv:1409.5009v117 citations
Originality Incremental advance
AI Analysis

This work addresses a common practical issue in data analysis, such as visualizing protein sequence diversity, but is incremental as it builds upon existing distance estimation methods.

The paper tackles the problem of recovering Euclidean distance matrices from noisy observations by proposing a regularized kernel estimate that applies constant shrinkage to all observed pairwise distances, achieving consistent estimation of true distances as the number of objects increases.

Although recovering an Euclidean distance matrix from noisy observations is a common problem in practice, how well this could be done remains largely unknown. To fill in this void, we study a simple distance matrix estimate based upon the so-called regularized kernel estimate. We show that such an estimate can be characterized as simply applying a constant amount of shrinkage to all observed pairwise distances. This fact allows us to establish risk bounds for the estimate implying that the true distances can be estimated consistently in an average sense as the number of objects increases. In addition, such a characterization suggests an efficient algorithm to compute the distance matrix estimator, as an alternative to the usual second order cone programming known not to scale well for large problems. Numerical experiments and an application in visualizing the diversity of Vpu protein sequences from a recent HIV-1 study further demonstrate the practical merits of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes