Inferring relevant features: from QFT to PCA
This work addresses feature extraction for machine learning practitioners by proposing a novel, data-adaptive approach that enhances classification performance, though it is incremental as it builds on existing kernel PCA methods.
The paper tackled the problem of identifying important features in unlabeled datasets by adapting renormalization techniques from quantum field theory, resulting in a method similar to kernel PCA with a learned kernel that significantly improved classification accuracy on handwritten digits compared to a simple Gaussian kernel.
In many-body physics, renormalization techniques are used to extract aspects of a statistical or quantum state that are relevant at large scale, or for low energy experiments. Recent works have proposed that these features can be formally identified as those perturbations of the states whose distinguishability most resist coarse-graining. Here, we examine whether this same strategy can be used to identify important features of an unlabeled dataset. This approach indeed results in a technique very similar to kernel PCA (principal component analysis), but with a kernel function that is automatically adapted to the data, or "learned". We test this approach on handwritten digits, and find that the most relevant features are significantly better for classification than those obtained from a simple gaussian kernel.