Internal node bagging
This provides a more efficient training method for neural networks that improves fitting ability during training while maintaining small model size during inference, though it appears incremental over dropout.
The paper tackles the problem of dropout's implicit ensemble learning by proposing internal node bagging, which explicitly groups nodes to learn specific features during training and combines them into single nodes during inference. This approach achieved significantly better performance than dropout on small models across several benchmark datasets.
We introduce a novel view to understand how dropout works as an inexplicit ensemble learning method, which doesn't point out how many and which nodes to learn a certain feature. We propose a new training method named internal node bagging, it explicitly forces a group of nodes to learn a certain feature in training time, and combine those nodes to be one node in inference time. It means we can use much more parameters to improve model's fitting ability in training time while keeping model small in inference time. We test our method on several benchmark datasets and find it performs significantly better than dropout on small models.