Haiquan Qiu

LG
h-index6
7papers
25citations
Novelty63%
AI Score52

7 Papers

LGMar 22, 2023Code
Understanding Expressivity of GNN in Rule Learning

Haiquan Qiu, Yongqi Zhang, Yong Li et al.

Rule learning is critical to improving knowledge graph (KG) reasoning due to their ability to provide logical and interpretable explanations. Recently, Graph Neural Networks (GNNs) with tail entity scoring achieve the state-of-the-art performance on KG reasoning. However, the theoretical understandings for these GNNs are either lacking or focusing on single-relational graphs, leaving what the kind of rules these GNNs can learn an open problem. We propose to fill the above gap in this paper. Specifically, GNNs with tail entity scoring are unified into a common framework. Then, we analyze their expressivity by formally describing the rule structures they can learn and theoretically demonstrating their superiority. These results further inspire us to propose a novel labeling strategy to learn more rules in KG reasoning. Experimental results are consistent with our theoretical findings and verify the effectiveness of our proposed method. The code is publicly available at https://github.com/LARS-research/Rule-learning-expressivity.

LGFeb 15, 2025Code
Superpose Task-specific Features for Model Merging

Haiquan Qiu, You Wu, Dong Li et al.

Model merging enables powerful capabilities in neural networks without requiring additional training. In this paper, we introduce a novel perspective on model merging by leveraging the fundamental mechanisms of neural network representation. Our approach is motivated by the linear representation hypothesis, which states that neural networks encode information through linear combinations of feature vectors. We propose a method that superposes task-specific features from individual models into a merged model. Our approach specifically targets linear transformation matrices, which are crucial for feature activation and extraction in deep networks. By formulating the merging process as a linear system, we can preserve task-specific features from individual models and create merged models that effectively maintain multi-task capabilities compared to existing methods. Extensive experiments across diverse benchmarks and models demonstrate that our method outperforms existing techniques. Code is available at https://github.com/LARS-research/STF.

DBJan 30
Scaling GraphLLM with Bilevel-Optimized Sparse Querying

Yangzhe Peng, Haiquan Qiu, Quanming Yao et al.

LLMs have recently shown strong potential in enhancing node-level tasks on text-attributed graphs (TAGs) by providing explanation features. However, their practical use is severely limited by the high computational and monetary cost of repeated LLM queries. To illustrate, naively generating explanations for all nodes on a medium-sized benchmark like Photo (48k nodes) using a representative method (e.g., TAPE) would consume days of processing time. In this paper, we propose Bilevel-Optimized Sparse Querying (BOSQ), a general framework that selectively leverages LLM-derived explanation features to enhance performance on node-level tasks on TAGs. We design an adaptive sparse querying strategy that selectively decides when to invoke LLMs, avoiding redundant or low-gain queries and significantly reducing computation overhead. Extensive experiments on six real-world TAG datasets involving two types of node-level tasks demonstrate that BOSQ achieves orders of magnitude speedups over existing GraphLLM methods while consistently delivering on-par or superior performance.

LGOct 5, 2025Code
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Haiquan Qiu, Quanming Yao

The pursuit of computational efficiency has driven the adoption of low-precision formats for training transformer models. However, this progress is often hindered by notorious training instabilities. This paper provides the first mechanistic explanation for a long-standing and unresolved failure case where training with flash attention in low-precision settings leads to catastrophic loss explosion. Our in-depth analysis reveals that the failure is not a random artifact but caused by two intertwined phenomena: the emergence of similar low-rank representations within the attention mechanism and the compounding effect of biased rounding errors inherent in low-precision arithmetic. We demonstrate how these factors create a vicious cycle of error accumulation that corrupts weight updates, ultimately derailing the training dynamics. To validate our findings, we introduce a minimal modification to the flash attention that mitigates the bias in rounding errors. This simple change stabilizes the training process, confirming our analysis and offering a practical solution to this persistent problem. Code is available at https://github.com/ucker/why-low-precision-training-fails.

LGOct 15, 2024
Neural Symbolic Regression of Complex Network Dynamics

Haiquan Qiu, Shuzhi Liu, Quanming Yao

Complex networks describe important structures in nature and society, composed of nodes and the edges that connect them. The evolution of these networks is typically described by dynamics, which are labor-intensive and require expert knowledge to derive. However, because the complex network involves noisy observations from multiple trajectories of nodes, existing symbolic regression methods are either not applicable or ineffective on its dynamics. In this paper, we propose Physically Inspired Neural Dynamics Symbolic Regression (PI-NDSR), a method based on neural networks and genetic programming to automatically learn the symbolic expression of dynamics. Our method consists of two key components: a Physically Inspired Neural Dynamics (PIND) to augment and denoise trajectories through observed trajectory interpolation; and a coordinated genetic search algorithm to derive symbolic expressions. This algorithm leverages references of node dynamics and edge dynamics from neural dynamics to avoid overfitted expressions in symbolic space. We evaluate our method on synthetic datasets generated by various dynamics and real datasets on disease spreading. The results demonstrate that PI-NDSR outperforms the existing method in terms of both recovery probability and error.

LGMar 17, 2024
Graph Unitary Message Passing

Haiquan Qiu, Yatao Bian, Quanming Yao

Message passing mechanism contributes to the success of GNNs in various applications, but also brings the oversquashing problem. Recent works combat oversquashing by improving the graph spectrums with rewiring techniques, disrupting the structural bias in graphs, and having limited improvement on oversquashing in terms of oversquashing measure. Motivated by unitary RNN, we propose Graph Unitary Message Passing (GUMP) to alleviate oversquashing in GNNs by applying unitary adjacency matrix for message passing. To design GUMP, a transformation is first proposed to make general graphs have unitary adjacency matrix and keep its structural bias. Then, unitary adjacency matrix is obtained with a unitary projection algorithm, which is implemented by utilizing the intrinsic structure of unitary adjacency matrix and allows GUMP to be permutation-equivariant. Experimental results show the effectiveness of GUMP in improving the performance on various graph learning tasks.

LGOct 5, 2025
Spectral Alignment as Predictor of Loss Explosion in Neural Network Training

Haiquan Qiu, You Wu, Yingjie Tan et al.

Loss explosions in training deep neural networks can nullify multi-million dollar training runs. Conventional monitoring metrics like weight and gradient norms are often lagging and ambiguous predictors, as their values vary dramatically across different models and even between layers of the same model, making it difficult to establish a unified standard for detecting impending failure. We introduce Spectral Alignment (SA), a novel, theoretically-grounded metric that monitors the distributional alignment between layer inputs and the principal singular vectors of weight matrices. We show that a collapse in the sign diversity of this alignment is a powerful early predictor of representational collapse and training divergence. Empirical results on language models demonstrate that monitoring the SA distribution provides a significantly earlier and clearer warning of loss explosions than traditional scalar metrics. SA's low computational overhead makes it a practical tool for safeguarding model training.