LGSep 2, 2022
Rethinking Efficiency and Redundancy in Training Large-scale GraphsXin Liu, Xunbin Xiong, Mingyu Yan et al.
Large-scale graphs are ubiquitous in real-world scenarios and can be trained by Graph Neural Networks (GNNs) to generate representation for downstream tasks. Given the abundant information and complex topology of a large-scale graph, we argue that redundancy exists in such graphs and will degrade the training efficiency. Unfortunately, the model scalability severely restricts the efficiency of training large-scale graphs via vanilla GNNs. Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue. It still takes intolerable time to train these models on large-scale graphs. Thereby, we propose to drop redundancy and improve efficiency of training large-scale graphs with GNNs, by rethinking the inherent characteristics in a graph. In this paper, we pioneer to propose a once-for-all method, termed DropReef, to drop the redundancy in large-scale graphs. Specifically, we first conduct preliminary experiments to explore potential redundancy in large-scale graphs. Next, we present a metric to quantify the neighbor heterophily of all nodes in a graph. Based on both experimental and theoretical analysis, we reveal the redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily and a great number of neighbors. Then, we propose DropReef to detect and drop the redundancy in large-scale graphs once and for all, helping reduce the training time while ensuring no sacrifice in the model accuracy. To demonstrate the effectiveness of DropReef, we apply it to recent state-of-the-art sampling-based GNNs for training large-scale graphs, owing to the high precision of such models. With DropReef leveraged, the training efficiency of models can be greatly promoted. DropReef is highly compatible and is offline performed, benefiting the state-of-the-art sampling-based GNNs in the present and future to a significant extent.
ARAug 27, 2024
SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN AccelerationRunzhen Xue, Mingyu Yan, Dengke Han et al.
Heterogeneous Graph Neural Networks (HGNNs) have expanded graph representation learning to heterogeneous graph fields. Recent studies have demonstrated their superior performance across various applications, including medical analysis and recommendation systems, often surpassing existing methods. However, GPUs often experience inefficiencies when executing HGNNs due to their unique and complex execution patterns. Compared to traditional Graph Neural Networks, these patterns further exacerbate irregularities in memory access. To tackle these challenges, recent studies have focused on developing domain-specific accelerators for HGNNs. Nonetheless, most of these efforts have concentrated on optimizing the datapath or scheduling data accesses, while largely overlooking the potential benefits that could be gained from leveraging the inherent properties of the semantic graph, such as its topology, layout, and generation. In this work, we focus on leveraging the properties of semantic graphs to enhance HGNN performance. First, we analyze the Semantic Graph Build (SGB) stage and identify significant opportunities for data reuse during semantic graph generation. Next, we uncover the phenomenon of buffer thrashing during the Graph Feature Processing (GFP) stage, revealing potential optimization opportunities in semantic graph layout. Furthermore, we propose a lightweight hardware accelerator frontend for HGNNs, called SiHGNN. This accelerator frontend incorporates a tree-based Semantic Graph Builder for efficient semantic graph generation and features a novel Graph Restructurer for optimizing semantic graph layouts. Experimental results show that SiHGNN enables the state-of-the-art HGNN accelerator to achieve an average performance improvement of 2.95$\times$.
ARApr 18, 2025Code
MetaDSE: A Few-shot Meta-learning Framework for Cross-workload CPU Design Space ExplorationRunzhen Xue, Hao Wu, Mingyu Yan et al.
Cross-workload design space exploration (DSE) is crucial in CPU architecture design. Existing DSE methods typically employ the transfer learning technique to leverage knowledge from source workloads, aiming to minimize the requirement of target workload simulation. However, these methods struggle with overfitting, data ambiguity, and workload dissimilarity. To address these challenges, we reframe the cross-workload CPU DSE task as a few-shot meta-learning problem and further introduce MetaDSE. By leveraging model agnostic meta-learning, MetaDSE swiftly adapts to new target workloads, greatly enhancing the efficiency of cross-workload CPU DSE. Additionally, MetaDSE introduces a novel knowledge transfer method called the workload-adaptive architectural mask algorithm, which uncovers the inherent properties of the architecture. Experiments on SPEC CPU 2017 demonstrate that MetaDSE significantly reduces prediction error by 44.3\% compared to the state-of-the-art. MetaDSE is open-sourced and available at this \href{https://anonymous.4open.science/r/Meta_DSE-02F8}{anonymous GitHub.}
LGOct 24, 2024
Multi-objective Optimization in CPU Design Space Exploration: Attention is All You NeedRunzhen Xue, Hao Wu, Mingyu Yan et al.
Design Space Exploration (DSE) is essential to modern CPU design, yet current frameworks struggle to scale and generalize in high-dimensional architectural spaces. As the dimensionality of design spaces continues to grow, existing DSE frameworks face three fundamental challenges: (1) reduced accuracy and poor scalability of surrogate models in large design spaces; (2) inefficient acquisition guided by hand-crafted heuristics or exhaustive search; (3) limited interpretability, making it hard to pinpoint architectural bottlenecks. In this work, we present \textbf{AttentionDSE}, the first end-to-end DSE framework that \emph{natively integrates} performance prediction and design guidance through an attention-based neural architecture. Unlike traditional DSE workflows that separate surrogate modeling from acquisition and rely heavily on hand-crafted heuristics, AttentionDSE establishes a unified, learning-driven optimization loop, in which attention weights serve a dual role: enabling accurate performance estimation and simultaneously exposing the performance bottleneck. This paradigm shift elevates attention from a passive representation mechanism to an active, interpretable driver of design decision-making. Key innovations include: (1) a \textbf{Perception-Driven Attention} mechanism that exploits architectural hierarchy and locality, scaling attention complexity from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$ via sliding windows; (2) an \textbf{Attention-aware Bottleneck Analysis} that automatically surfaces critical parameters for targeted optimization, eliminating the need for domain-specific heuristics. Evaluated on high-dimensional CPU design space using the SPEC CPU2017 benchmark suite, AttentionDSE achieves up to \textbf{3.9\% higher Pareto Hypervolume} and over \textbf{80\% reduction in exploration time} compared to state-of-the-art baselines.