On Classification with Bags, Groups and Sets
This work addresses the challenge of handling non-traditional data structures in machine learning for researchers and practitioners, but it is incremental as it primarily organizes and synthesizes existing approaches.
The paper tackles the problem of classification when data is organized in sets, groups, or bags rather than individual feature vectors, and provides an overview and taxonomy to map out the relationships between existing learning scenarios in this area.
Many classification problems can be difficult to formulate directly in terms of the traditional supervised setting, where both training and test samples are individual feature vectors. There are cases in which samples are better described by sets of feature vectors, that labels are only available for sets rather than individual samples, or, if individual labels are available, that these are not independent. To better deal with such problems, several extensions of supervised learning have been proposed, where either training and/or test objects are sets of feature vectors. However, having been proposed rather independently of each other, their mutual similarities and differences have hitherto not been mapped out. In this work, we provide an overview of such learning scenarios, propose a taxonomy to illustrate the relationships between them, and discuss directions for further research in these areas.