Joining datasets via data augmentation in the label space for neural networks
This addresses the challenge of integrating multiple datasets with non-overlapping or hierarchical labels, which is incremental as it builds on existing dataset joining methods by focusing on the label space.
The paper tackles the problem of joining datasets with different label structures for neural network training by proposing a method that performs data augmentation in the label space, using techniques like knowledge graphs and policy gradient. Empirical results on image and text classification tasks demonstrate the validity of this approach.
Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.