LG MLMay 31, 2022

Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning

arXiv:2205.15304v1h-index: 13

Originality Incremental advance

AI Analysis

This incremental improvement makes EM-based supervised learning more practical for higher-dimensional datasets in classification tasks.

The paper addresses computational complexity and lack of rigorous derivation in EM training for shared kernel models by partitioning the feature space, achieving improved performance at R² times lower complexity on MNIST and other benchmarks.

Expectation maximisation (EM) is usually thought of as an unsupervised learning method for estimating the parameters of a mixture distribution, however it can also be used for supervised learning when class labels are available. As such, EM has been applied to train neural nets including the probabilistic radial basis function (PRBF) network or shared kernel (SK) model. This paper addresses two major shortcomings of previous work in this area: the lack of rigour in the derivation of the EM training algorithm; and the computational complexity of the technique, which has limited it to low dimensional data sets. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier, making use of data association theory to obtain the complete data likelihood, Baum's auxiliary function (the E-step) and its subsequent maximisation (M-step). To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables. The resulting product decomposition of the joint data likelihood, which is exact when the feature partitions are independent, allows the SKEM to be implemented in parallel and at $R^2$ times lower complexity. The operation of the partitioned SKEM algorithm is demonstrated on the MNIST data set and compared with its non-partitioned counterpart. It eventuates that improved performance at reduced complexity is achievable. Comparisons with standard classification algorithms are provided on a number of other benchmark data sets.

View on arXiv PDF

Similar