Hiroshi Inoue

LG
3papers
552citations
Novelty45%
AI Score25

3 Papers

NEMay 23, 2019
Multi-Sample Dropout for Accelerated Training and Better Generalization

Hiroshi Inoue

Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the neurons to avoid overfitting. This paper presents an enhanced dropout technique, which we call multi-sample dropout, for both accelerating training and improving generalization over the original dropout. The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the sample losses are averaged to obtain the final loss. This technique can be easily implemented by duplicating a part of the network after the dropout layer while sharing the weights among the duplicated fully connected layers. Experimental results using image classification tasks including ImageNet, CIFAR-10, and CIFAR-100 showed that multi-sample dropout accelerates training. Moreover, the networks trained using multi-sample dropout achieved lower error rates compared to networks trained with the original dropout. The additional computation cost due to the duplicated operations is not significant for deep convolutional networks because most of the computation time is consumed in the convolution layers before the dropout layer, which are not duplicated.

LGJan 9, 2018
Data Augmentation by Pairing Samples for Images Classification

Hiroshi Inoue

Data augmentation is a widely used technique in many machine learning tasks, such as image classification, to virtually enlarge the training dataset size and avoid overfitting. Traditional data augmentation techniques for image classification tasks create new samples from the original training data by, for example, flipping, distorting, adding a small amount of noise to, or cropping a patch from an original image. In this paper, we introduce a simple but surprisingly effective data augmentation technique for image classification tasks. With our technique, named SamplePairing, we synthesize a new sample from one image by overlaying another image randomly chosen from the training data (i.e., taking an average of two images for each pixel). By using two images randomly selected from the training set, we can generate $N^2$ new samples from $N$ training samples. This simple data augmentation technique significantly improved classification accuracy for all the tested datasets; for example, the top-1 error rate was reduced from 33.5% to 29.0% for the ILSVRC 2012 dataset with GoogLeNet and from 8.22% to 6.93% in the CIFAR-10 dataset. We also show that our SamplePairing technique largely improved accuracy when the number of samples in the training set was very small. Therefore, our technique is more valuable for tasks with a limited amount of training data, such as medical imaging tasks.

LGFeb 27, 2017
Adaptive Ensemble Prediction for Deep Neural Networks based on Confidence Level

Hiroshi Inoue

Ensembling multiple predictions is a widely used technique for improving the accuracy of various machine learning tasks. One obvious drawback of ensembling is its higher execution cost during inference. In this paper, we first describe our insights on the relationship between the probability of prediction and the effect of ensembling with current deep neural networks; ensembling does not help mispredictions for inputs predicted with a high probability even when there is a non-negligible number of mispredicted inputs. This finding motivated us to develop a way to adaptively control the ensembling. If the prediction for an input reaches a high enough probability, i.e., the output from the softmax function, on the basis of the confidence level, we stop ensembling for this input to avoid wasting computation power. We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation cost significantly while achieving accuracy similar to that of static ensembling using a pre-defined number of local predictions. We also show that our statistically rigorous confidence-level-based early-exit condition reduces the burden of task-dependent threshold tuning better compared with naive early exit based on a pre-defined threshold in addition to yielding a better accuracy with the same cost.