Multi Instance Learning For Unbalanced Data
This work addresses data imbalance in Multi Instance Learning, which is an incremental improvement for applications like patch classification with weak labels.
The paper tackles the problem of data imbalance in Multi Instance Learning by analyzing the Single Instance learning objective, showing that greater imbalance improves algorithm resilience to statistical dependencies and that neural networks mitigate known issues with linear classifiers.
In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective. We show that when the data is unbalanced and the family of classifiers is sufficiently rich, the SI method is a useful learning algorithm. In particular, we show that larger data imbalance, a quality that is typically perceived as negative, in fact implies a better resilience of the algorithm to the statistical dependencies of the objects in bags. In addition, our results shed new light on some known issues with the SI method in the setting of linear classifiers, and we show that these issues are significantly less likely to occur in the setting of neural networks. We demonstrate our results on a synthetic dataset, and on the COCO dataset for the problem of patch classification with weak image level labels derived from captions.