ML LGAug 21, 2025

Interpretable Kernels

Patrick J. F. Groenen, Michael Greenacre

arXiv:2508.15932v1h-index: 10

Originality Incremental advance

AI Analysis

This addresses the need for interpretable AI in machine learning, particularly for practitioners using kernel methods, though it is incremental as it builds on existing kernel frameworks.

The paper tackles the problem of losing interpretability in kernel methods by showing that kernel solutions can be re-expressed as weighted linear combinations of original features, enabling interpretation while preserving predictive accuracy, with exact results for wide feature matrices and approximations for narrow ones.

The use of kernels for nonlinear prediction is widespread in machine learning. They have been popularized in support vector machines and used in kernel ridge regression, amongst others. Kernel methods share three aspects. First, instead of the original matrix of predictor variables or features, each observation is mapped into an enlarged feature space. Second, a ridge penalty term is used to shrink the coefficients on the features in the enlarged feature space. Third, the solution is not obtained in this enlarged feature space, but through solving a dual problem in the observation space. A major drawback in the present use of kernels is that the interpretation in terms of the original features is lost. In this paper, we argue that in the case of a wide matrix of features, where there are more features than observations, the kernel solution can be re-expressed in terms of a linear combination of the original matrix of features and a ridge penalty that involves a special metric. Consequently, the exact same predicted values can be obtained as a weighted linear combination of the features in the usual manner and thus can be interpreted. In the case where the number of features is less than the number of observations, we discuss a least-squares approximation of the kernel matrix that still allows the interpretation in terms of a linear combination. It is shown that these results hold for any function of a linear combination that minimizes the coefficients and has a ridge penalty on these coefficients, such as in kernel logistic regression and kernel Poisson regression. This work makes a contribution to interpretable artificial intelligence.

View on arXiv PDF

Similar