Characterizing and Understanding HGNN Training on GPUs
This work addresses the time-consuming and costly training process for HGNNs, which are critical in domains like recommendation systems and medical analysis, by offering insights to enhance efficiency, though it is incremental as it builds on existing training methods.
The study characterized and analyzed the performance bottlenecks in training Heterogeneous Graph Neural Networks (HGNNs) on GPUs, identifying inefficiencies in both single-GPU and multi-GPU scenarios and providing optimization guidelines.
Owing to their remarkable representation capabilities for heterogeneous graph data, Heterogeneous Graph Neural Networks (HGNNs) have been widely adopted in many critical real-world domains such as recommendation systems and medical analysis. Prior to their practical application, identifying the optimal HGNN model parameters tailored to specific tasks through extensive training is a time-consuming and costly process. To enhance the efficiency of HGNN training, it is essential to characterize and analyze the execution semantics and patterns within the training process to identify performance bottlenecks. In this study, we conduct an in-depth quantification and analysis of two mainstream HGNN training scenarios, including single-GPU and multi-GPU distributed training. Based on the characterization results, we disclose the performance bottlenecks and their underlying causes in different HGNN training scenarios and provide optimization guidelines from both software and hardware perspectives.