MLNov 3, 2022
Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical ClusteringDaniel J. W. Touw, Patrick J. F. Groenen, Yoshikazu Terada
Convex clustering is a modern method with both hierarchical and $k$-means clustering characteristics. Although convex clustering can capture complex clustering structures hidden in data, the existing convex clustering algorithms are not scalable to large data sets with sample sizes greater than several thousands. Moreover, it is known that convex clustering sometimes fails to produce a complete hierarchical clustering structure. This issue arises if clusters split up or the minimum number of possible clusters is larger than the desired number of clusters. In this paper, we propose convex clustering through majorization-minimization (CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme derived using diagonal majorization. Additionally, we explore different strategies to ensure that the hierarchical clustering structure terminates in a single cluster. With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space, achieving a solution time of 51 seconds on average.
MLAug 21, 2025
Interpretable KernelsPatrick J. F. Groenen, Michael Greenacre
The use of kernels for nonlinear prediction is widespread in machine learning. They have been popularized in support vector machines and used in kernel ridge regression, amongst others. Kernel methods share three aspects. First, instead of the original matrix of predictor variables or features, each observation is mapped into an enlarged feature space. Second, a ridge penalty term is used to shrink the coefficients on the features in the enlarged feature space. Third, the solution is not obtained in this enlarged feature space, but through solving a dual problem in the observation space. A major drawback in the present use of kernels is that the interpretation in terms of the original features is lost. In this paper, we argue that in the case of a wide matrix of features, where there are more features than observations, the kernel solution can be re-expressed in terms of a linear combination of the original matrix of features and a ridge penalty that involves a special metric. Consequently, the exact same predicted values can be obtained as a weighted linear combination of the features in the usual manner and thus can be interpreted. In the case where the number of features is less than the number of observations, we discuss a least-squares approximation of the kernel matrix that still allows the interpretation in terms of a linear combination. It is shown that these results hold for any function of a linear combination that minimizes the coefficients and has a ridge penalty on these coefficients, such as in kernel logistic regression and kernel Poisson regression. This work makes a contribution to interpretable artificial intelligence.
MEFeb 16, 2021
The MELODIC family for simultaneous binary logistic regression in a reduced spaceMark de Rooij, Patrick J. F. Groenen
Logistic regression is a commonly used method for binary classification. Researchers often have more than a single binary response variable and simultaneous analysis is beneficial because it provides insight into the dependencies among response variables as well as between the predictor variables and the responses. Moreover, in such a simultaneous analysis the equations can lend each other strength, which might increase predictive accuracy. In this paper, we propose the MELODIC family for simultaneous binary logistic regression modeling. In this family, the regression models are defined in a Euclidean space of reduced dimension, based on a distance rule. The model may be interpreted in terms of logistic regression coefficients or in terms of a biplot. We discuss a fast iterative majorization (or MM) algorithm for parameter estimation. Two applications are shown in detail: one relating personality characteristics to drug consumption profiles and one relating personality characteristics to depressive and anxiety disorders. We present a thorough comparison of our MELODIC family with alternative approaches for multivariate binary data.