Resampling and averaging coordinates on data
This work addresses the challenge of robust coordinate computation for point cloud data, particularly in genomics, but it appears incremental as it builds on existing embedding and clustering techniques.
The paper tackles the problem of robustly computing intrinsic coordinates on point clouds by introducing an algorithm that generates candidate coordinates through subsampling and hyperparameter variation, then averages representative embeddings using clustering and generalized Procrustes analysis, demonstrating robustness to noise and outliers on synthetic and genomics data.
We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.