ML LGJul 12, 2017

An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling

arXiv:1707.03538v14.17 citations

Originality Synthesis-oriented

AI Analysis

This provides a theoretical and practical framework for researchers and practitioners working with complex data, but it is incremental as it builds on existing MoE concepts.

The paper tackles the problem of modeling complex data generating processes using mixture-of-experts (MoE) models, proposing a maximum quasi-likelihood estimator with consistency and asymptotic normality, and demonstrating applications in classification, clustering, and regression through examples.

Mixture-of-experts (MoE) models are a powerful paradigm for modeling of data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi-likelihood (MQL) estimator as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorization-maximizatoin (blockwise-MM) algorithm framework is proposed as an all-purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise-MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and we illustrate these applications via a pair of worked examples.

View on arXiv PDF

Similar