Kernel Subspace and Feature Extraction
This work provides a theoretical foundation for kernel methods in machine learning, potentially improving model performance for practitioners, though it appears incremental in linking existing concepts.
The paper tackles the problem of connecting kernel methods with feature extraction by establishing a correspondence between feature subspaces and kernels, proposing an information-theoretic measure, and constructing a maximal correlation kernel. It demonstrates that using this kernel with SVM achieves minimum prediction error and interprets the Fisher kernel as a special case with optimality.
We study kernel methods in machine learning from the perspective of feature subspace. We establish a one-to-one correspondence between feature subspaces and kernels and propose an information-theoretic measure for kernels. In particular, we construct a kernel from Hirschfeld--Gebelein--Rényi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality. We use the support vector machine (SVM) as an example to illustrate a connection between kernel methods and feature extraction approaches. We show that the kernel SVM on maximal correlation kernel achieves minimum prediction error. Finally, we interpret the Fisher kernel as a special maximal correlation kernel and establish its optimality.