STAILGMEMLApr 18, 2021

Non-asymptotic model selection in block-diagonal mixture of polynomial experts models

arXiv:2104.08959v211 citations
Originality Incremental advance
AI Analysis

This work addresses model selection for complex regression scenarios with structured interactions, offering a method to infer mixture components, polynomial degrees, and covariance structures, but it is incremental as it builds on existing mixture of experts frameworks with specific structural assumptions.

The paper tackles the problem of modeling non-linear relationships in regression data with hidden graph-structured interactions among high-dimensional predictors, using a block-diagonal mixture of polynomial experts model, and provides a non-asymptotic model selection criterion with a finite-sample oracle inequality for theoretical guarantees.

Model selection, via penalized likelihood type criteria, is a standard task in many statistical inference and machine learning problems. Progress has led to deriving criteria with asymptotic consistency results and an increasing emphasis on introducing non-asymptotic criteria. We focus on the problem of modeling non-linear relationships in regression data with potential hidden graph-structured interactions between the high-dimensional predictors, within the mixture of experts modeling framework. In order to deal with such a complex situation, we investigate a block-diagonal localized mixture of polynomial experts (BLoMPE) regression model, which is constructed upon an inverse regression and block-diagonal structures of the Gaussian expert covariance matrices. We introduce a penalized maximum likelihood selection criterion to estimate the unknown conditional density of the regression model. This model selection criterion allows us to handle the challenging problem of inferring the number of mixture components, the degree of polynomial mean functions, and the hidden block-diagonal structures of the covariance matrices, which reduces the number of parameters to be estimated and leads to a trade-off between complexity and sparsity in the model. In particular, we provide a strong theoretical guarantee: a finite-sample oracle inequality satisfied by the penalized maximum likelihood estimator with a Jensen-Kullback-Leibler type loss, to support the introduced non-asymptotic model selection criterion. The penalty shape of this criterion depends on the complexity of the considered random subcollection of BLoMPE models, including the relevant graph structures, the degree of polynomial mean functions, and the number of mixture components.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes