Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases
This work tackles the critical problem of class imbalance in relational deep learning, which can lead to unusable models for practitioners working with real-world relational databases.
This paper addresses the class imbalance problem in relational deep learning (RDL) for entity classification on relational databases (RDBs). The proposed Rel-MOSS method improves Balanced Accuracy by up to 2.46% and G-Mean by up to 4.00% compared to state-of-the-art RDL and classic imbalance handling methods.
In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.