Towards Non-I.I.D. Image Classification: A Dataset and Baselines
This addresses the understudied issue of non-I.I.D. data in image classification, which is common in practice and causes model instability, by providing a dataset and baselines for incremental research.
The authors tackled the problem of non-I.I.D. image classification by constructing the NICO dataset to create controlled non-I.I.D. scenarios, and proposed a baseline ConvNet model with a batch balancing module that improved performance in these settings.
I.I.D. hypothesis between training and testing data is the basis of numerous image classification methods. Such property can hardly be guaranteed in practice where the Non-IIDness is common, causing instable performances of these models. In literature, however, the Non-I.I.D. image classification problem is largely understudied. A key reason is lacking of a well-designed dataset to support related research. In this paper, we construct and release a Non-I.I.D. image dataset called NICO, which uses contexts to create Non-IIDness consciously. Compared to other datasets, extended analyses prove NICO can support various Non-I.I.D. situations with sufficient flexibility. Meanwhile, we propose a baseline model with ConvNet structure for General Non-I.I.D. image classification, where distribution of testing data is unknown but different from training data. The experimental results demonstrate that NICO can well support the training of ConvNet model from scratch, and a batch balancing module can help ConvNets to perform better in Non-I.I.D. settings.