LGMay 22

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi

arXiv:2605.2324156.7Has Code

Predicted impact top 41% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of effective self-supervised pre-training for relational databases, which is crucial for diverse predictive tasks in data systems, by providing a more adaptable representation learning approach.

RelPrism introduces a multi-faceted self-supervised learning framework for relational databases that constructs intrinsic, relational, and hybrid attributes with multi-granularity clustering to generate diverse pre-training tasks. It achieves 4.15% ROC-AUC improvement for classification and 10.75% MAE reduction for regression over state-of-the-art baselines across 14 tasks.

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.

View on arXiv PDF

Similar