LG MLDec 25, 2018

Dropout Regularization in Hierarchical Mixture of Experts

arXiv:1812.10158v14.720 citationsh-index: 28

Originality Incremental advance

AI Analysis

This is an incremental improvement for researchers using hierarchical mixture of experts, addressing overfitting in deep tree structures.

The paper tackled overfitting in hierarchical mixture of experts by proposing a dropout variant faithful to the tree hierarchy, showing improved generalization and smoother fits on synthetic regression, MNIST, and CIFAR-10 datasets.

Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a soft hierarchical partitioning of the input space. In this work, we propose a variant of dropout for hierarchical mixture of experts that is faithful to the tree hierarchy defined by the model, as opposed to having a flat, unitwise independent application of dropout as one has with multi-layer perceptrons. We show that on a synthetic regression data and on MNIST and CIFAR-10 datasets, our proposed dropout mechanism prevents overfitting on trees with many levels improving generalization and providing smoother fits.

View on arXiv PDF

Similar