MLAug 21, 2025
Interpretable KernelsPatrick J. F. Groenen, Michael Greenacre
The use of kernels for nonlinear prediction is widespread in machine learning. They have been popularized in support vector machines and used in kernel ridge regression, amongst others. Kernel methods share three aspects. First, instead of the original matrix of predictor variables or features, each observation is mapped into an enlarged feature space. Second, a ridge penalty term is used to shrink the coefficients on the features in the enlarged feature space. Third, the solution is not obtained in this enlarged feature space, but through solving a dual problem in the observation space. A major drawback in the present use of kernels is that the interpretation in terms of the original features is lost. In this paper, we argue that in the case of a wide matrix of features, where there are more features than observations, the kernel solution can be re-expressed in terms of a linear combination of the original matrix of features and a ridge penalty that involves a special metric. Consequently, the exact same predicted values can be obtained as a weighted linear combination of the features in the usual manner and thus can be interpreted. In the case where the number of features is less than the number of observations, we discuss a least-squares approximation of the kernel matrix that still allows the interpretation in terms of a linear combination. It is shown that these results hold for any function of a linear combination that minimizes the coefficients and has a ridge penalty on these coefficients, such as in kernel logistic regression and kernel Poisson regression. This work makes a contribution to interpretable artificial intelligence.
MLNov 17, 2021
Three approaches to supervised learning for compositional data with pairwise logratiosGerma Coenders, Michael Greenacre
The common approach to compositional data analysis is to transform the data by means of logratios. Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems. When the number of parts is large, some form of logratio selection is a must, for instance by means of an unsupervised learning method based on a stepwise selection of the pairwise logratios that explain the largest percentage of the logratio variance in the compositional dataset. In this article we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that $K-1$ selected logratios involve exactly $K$ parts. This method in fact searches for the subcomposition with the highest explanatory power. Once the subcomposition is identified, the researcher's favourite logratio representation may be used in subsequent analyses, not only pairwise logratios. Our methodology allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an illustration of the three approaches on a dataset from a study predicting Crohn's disease. The first method excels in terms of predictive power, and the other two in interpretability.