MLDATA-ANGEO-PHFeb 18, 2021

Joint Characterization of Multiscale Information in High Dimensional Data

arXiv:2102.09669v116 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental method for researchers analyzing geospatial or other high-dimensional data, as it integrates existing techniques without introducing new algorithms.

The authors tackled the problem of analyzing high-dimensional data with multiple scales of variance by proposing a joint characterization approach that combines PCA for global structure and t-sne for local structure, showing it detects signals not evident from either method alone in synthetic and real-world data.

High dimensional data can contain multiple scales of variance. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in this cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-stochastic neighbor embedding (t-sne) to characterize local variance structure. Using both synthetic images and real-world imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either PCA or t-sne alone. Broadly, t-sne is effective at rendering a randomly oriented low-dimensional map of local clusters, and PCA renders this map interpretable by providing global, physically meaningful structure. This approach is illustrated using imaging spectroscopy data, and may prove particularly useful for other geospatial data given robust local variance structure due to spatial autocorrelation and physical interpretability of global variance structure due to spectral properties of Earth surface materials. However, the fundamental premise could easily be extended to other high dimensional datasets, including image time series and non-image data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes