Sparse eigenbasis approximation: multiple feature extraction across spatiotemporal scales with application to coherent set identification
For researchers using spectral clustering or transfer operator methods, SEBA provides an automated, robust way to extract coherent sets from eigenvectors, especially when eigengaps are unclear.
The paper introduces SEBA, a sparse eigenbasis approximation method to extract coherent sets from eigenvectors of transfer operators, streamlining the final stage of spectral clustering. It demonstrates efficacy on geophysical datasets with many coherent sets.
The output of spectral clustering is a collection of eigenvalues and eigenvectors that encode important connectivity information about a graph or a manifold. This connectivity information is often not cleanly represented in the eigenvectors and must be disentangled by some secondary procedure. We propose the use of an approximate sparse basis for the space spanned by the leading eigenvectors as a natural, robust, and efficient means of performing this separation. The use of sparsity yields a natural cutoff in this disentanglement procedure and is particularly useful in practical situations when there is no clear eigengap. In order to select a suitable collection of vectors we develop a new Weyl-inspired eigengap heuristic and heuristics based on the sparse basis vectors. We develop an automated eigenvector separation procedure and illustrate its efficacy on examples from time-dependent dynamics on manifolds. In this context, transfer operator approaches are extensively used to find dynamically disconnected regions of phase space, known as almost-invariant sets or coherent sets. The dominant eigenvectors of transfer operators or related operators, such as the dynamic Laplacian, encode dynamic connectivity information. Our sparse eigenbasis approximation (SEBA) methodology streamlines the final stage of transfer operator methods, namely the extraction of almost-invariant or coherent sets from the eigenvectors. It is particularly useful when used on domains with large numbers of coherent sets, and when the coherent sets do not exhaust the phase space, such as in large geophysical datasets.