Haoran Yang

CL
h-index19
33papers
2,249citations
Novelty51%
AI Score59

33 Papers

CLNov 28, 2022
On the Effectiveness of Parameter-Efficient Fine-Tuning

Zihao Fu, Haoran Yang, Anthony Man-Cho So et al.

Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.

LGJul 1, 2022
Generating Counterfactual Hard Negative Samples for Graph Contrastive Learning

Haoran Yang, Hongxu Chen, Sixiao Zhang et al.

Graph contrastive learning has emerged as a powerful tool for unsupervised graph representation learning. The key to the success of graph contrastive learning is to acquire high-quality positive and negative samples as contrasting pairs for the purpose of learning underlying structural semantics of the input graph. Recent works usually sample negative samples from the same training batch with the positive samples, or from an external irrelevant graph. However, a significant limitation lies in such strategies, which is the unavoidable problem of sampling false negative samples. In this paper, we propose a novel method to utilize \textbf{C}ounterfactual mechanism to generate artificial hard negative samples for \textbf{G}raph \textbf{C}ontrastive learning, namely \textbf{CGC}, which has a different perspective compared to those sampling-based strategies. We utilize counterfactual mechanism to produce hard negative samples, which ensures that the generated samples are similar to, but have labels that different from the positive sample. The proposed method achieves satisfying results on several datasets compared to some traditional unsupervised graph learning methods and some SOTA graph contrastive learning methods. We also conduct some supplementary experiments to give an extensive illustration of the proposed method, including the performances of CGC with different hard negative samples and evaluations for hard negative samples generated with different similarity measurements.

CLMay 2, 2022
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas

Chen Xu, Piji Li, Wei Wang et al.

Maintaining a consistent persona is essential for building a human-like conversational model. However, the lack of attention to the partner makes the model more egocentric: they tend to show their persona by all means such as twisting the topic stiffly, pulling the conversation to their own interests regardless, and rambling their persona with little curiosity to the partner. In this work, we propose COSPLAY(COncept Set guided PersonaLized dialogue generation Across both partY personas) that considers both parties as a "team": expressing self-persona while keeping curiosity toward the partner, leading responses around mutual personas, and finding the common ground. Specifically, we first represent self-persona, partner persona and mutual dialogue all in the concept sets. Then, we propose the Concept Set framework with a suite of knowledge-enhanced operations to process them such as set algebras, set expansion, and set distance. Based on these operations as medium, we train the model by utilizing 1) concepts of both party personas, 2) concept relationship between them, and 3) their relationship to the future dialogue. Extensive experiments on a large public dataset, Persona-Chat, demonstrate that our model outperforms state-of-the-art baselines for generating less egocentric, more human-like, and higher quality responses in both automatic and human evaluations.

LGJul 24, 2022
Mitigating the Performance Sacrifice in DP-Satisfied Federated Settings through Graph Contrastive Learning

Haoran Yang, Xiangyu Zhao, Muyang Li et al.

Currently, graph learning models are indispensable tools to help researchers explore graph-structured data. In academia, using sufficient training data to optimize a graph model on a single device is a typical approach for training a capable graph learning model. Due to privacy concerns, however, it is infeasible to do so in real-world scenarios. Federated learning provides a practical means of addressing this limitation by introducing various privacy-preserving mechanisms, such as differential privacy (DP) on the graph edges. However, although DP in federated graph learning can ensure the security of sensitive information represented in graphs, it usually causes the performance of graph learning models to degrade. In this paper, we investigate how DP can be implemented on graph edges and observe a performance decrease in our experiments. In addition, we note that DP on graph edges introduces noise that perturbs graph proximity, which is one of the graph augmentations in graph contrastive learning. Inspired by this, we propose leveraging graph contrastive learning to alleviate the performance drop resulting from DP. Extensive experiments conducted with four representative graph models on five widely used benchmark datasets show that contrastive learning indeed alleviates the models' DP-induced performance drops.

CLApr 10, 2022
Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained Language Models For Classification Tasks

Haoran Yang, Piji Li, Wai Lam

Parameter-efficient tuning aims to distill knowledge for downstream tasks by optimizing a few introduced parameters while freezing the pretrained language models (PLMs). Continuous prompt tuning which prepends a few trainable vectors to the embeddings of input is one of these methods and has drawn much attention due to its effectiveness and efficiency. This family of methods can be illustrated as exerting nonlinear transformations of hidden states inside PLMs. However, a natural question is ignored: can the hidden states be directly used for classification without changing them? In this paper, we aim to answer this question by proposing a simple tuning method which only introduces three trainable vectors. Firstly, we integrate all layers hidden states using the introduced vectors. And then, we input the integrated hidden state(s) to a task-specific linear classifier to predict categories. This scheme is similar to the way ELMo utilises hidden states except that they feed the hidden states to LSTM-based models. Although our proposed tuning scheme is simple, it achieves comparable performance with prompt tuning methods like P-tuning and P-tuning v2, verifying that original hidden states do contain useful information for classification tasks. Moreover, our method has an advantage over prompt tuning in terms of time and the number of parameters.

78.6ROApr 23
Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

Yucheng Xin, Jiacheng Bao, Haoran Yang et al.

The integration of imitation and reinforcement learning has enabled remarkable advances in humanoid whole-body control, facilitating diverse human-like behaviors. However, research on environment-dependent motions remains limited. Existing methods typically enforce rigid trajectory tracking while neglecting physical interactions with the environment. We observe that humans naturally exploit a "weightless" state during non-self-stabilizing (NSS) motions--selectively relaxing specific joints to allow passive body--environment contact, thereby stabilizing the body and completing the motion. Inspired by this biological mechanism, we design a weightlessness-state auto-labeling strategy for dataset annotation; and we propose the Weightlessness Mechanism (WM), a method that dynamically determines which joints to relax and to what level, together enabling effective environmental interaction while executing target motions. We evaluate our approach on 3 representative NSS tasks: sitting on chairs of varying heights, lying down on beds with different inclinations, and leaning against walls via shoulder or elbow. Extensive experiments in simulation and on the Unitree G1 robot demonstrate that our WM method, trained on single-action demonstrations without any task-specific tuning, achieves strong generalization across diverse environmental configurations while maintaining motion stability. Our work bridges the gap between precise trajectory tracking and adaptive environmental interaction, offering a biologically-inspired solution for contact-rich humanoid control.

99.4ROMar 10
ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

Haoran Yang, Jiacheng Bao, Yucheng Xin et al.

Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive control, they are constrained to rigid locomotion patterns and expensive teleoperation data collection, lacking the versatility to execute more human-like natural behaviors such as sitting or kicking. Furthermore, acquiring the necessary real robot teleoperation data is prohibitively expensive and time-consuming. To address these limitations, we introduce ZeroWBC, a novel framework that learns a natural humanoid visuomotor control policy directly from human egocentric videos, eliminating the need for large-scale robot teleoperation data and enabling natural humanoid robot scene-interaction control. Specifically, our approach first fine-tunes a Vision-Language Model (VLM) to predict future whole-body human motions based on text instructions and egocentric visual context, then these generated motions are retargeted to real robot joints and executed via our robust general motion tracking policy for humanoid whole-body control. Extensive experiments on the Unitree G1 humanoid robot demonstrate that our method outperforms baseline approaches in motion naturalness and versatility, successfully establishing a pipeline that eliminates teleoperation data collection overhead for whole-body humanoid control, offering a scalable and efficient paradigm for general humanoid whole-body control.

79.6ROMar 13
PhyGile: Physics-Prefix Guided Motion Generation for Agile General Humanoid Motion Tracking

Jiacheng Bao, Haoran Yang, Yucheng Xin et al.

Humanoid robots are expected to execute agile and expressive whole-body motions in real-world settings. Existing text-to-motion generation models are predominantly trained on captured human motion datasets, whose priors assume human biomechanics, actuation, mass distribution, and contact strategies. When such motions are directly retargeted to humanoid robots, the resulting trajectories may satisfy geometric constraints (e.g., joint limits and pose continuity) and appear kinematically reasonable. However, they frequently violate the physical feasibility required for real-world execution. To address these issues, we present PhyGile, a unified framework that closes the loop between robot-native motion generation and General Motion Tracking (GMT). PhyGile performs physics-prefix-guided robot-native motion generation at inference time, directly generating robot-native motions in a 262-dimensional skeletal space with physics-guided prefixes, thereby eliminating inference-time retargeting artifacts and reducing generation-execution discrepancies. Before physics-prefix adaptation, we train the GMT controller with a curriculum-based mixture-of-experts scheme, followed by post-training on unlabeled motion data to improve robustness over large-scale robot motions. During physics-prefix adaptation, the GMT controller is further fine-tuned with generated objectives under physics-derived prefixes, enabling agile and stable execution of complex motions on real robots. Extensive offline and real-robot experiments demonstrate that PhyGile expands the frontier of text-driven humanoid control, enabling stable tracking of agile, highly difficult whole-body motions that go well beyond walking and low-dynamic motions typically achieved by prior methods.

74.4ROApr 30Code
Robot Learning from Human Videos: A Survey

Junyi Ma, Erhang Zhang, Haoran Yang et al.

A critical bottleneck hindering further advancement in embodied AI and robotics is the challenge of scaling robot data. To address this, the field of learning robot manipulation skills from human video data has attracted rapidly growing attention in recent years, driven by the abundance of human activity videos and advances in computer vision. This line of research promises to enable robots to acquire skills passively from the vast and readily available resource of human demonstrations, substantially favoring scalable learning for generalist robotic systems. Therefore, we present this survey to provide a comprehensive and up-to-date review of human-video-based learning techniques in robotics, focusing on both human-robot skill transfer and data foundations. We first review the policy learning foundations in robotics, and then describe the fundamental interfaces to incorporate human videos. Subsequently, we introduce a hierarchical taxonomy of transferring human videos to robot skills, covering task-, observation-, and action-oriented pathways, along with a cross-family analysis of their couplings with different data configurations and learning paradigms. In addition, we investigate the data foundations including widely-used human video datasets and video generation schemes, and provide large-scale statistical trends in dataset development and utilization. Ultimately, we emphasize the challenges and limitations intrinsic to this field, and delineate potential avenues for future research. The paper list of our survey is available at https://github.com/IRMVLab/awesome-robot-learning-from-human-videos.

CLJul 4, 2024
Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations

Hao Yang, Hongyuan Lu, Xinhua Zeng et al.

In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset to fine-tune existing large language models, enabling them to perform step-by-step dialogues. We thoroughly present Stephanie. Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm. We will release code, Stephanie datasets, and Stephanie LLMs to facilitate the future of chatbot eras.

SISep 2, 2024
LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning

Haoran Yang, Xiangyu Zhao, Sirui Huang et al.

Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.

CVApr 14, 2025Code
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

Changwei Wang, Shunpeng Chen, Yukun Song et al.

Visual Place Recognition (VPR) is aimed at predicting the location of a query image by referencing a database of geotagged images. For VPR task, often fewer discriminative local regions in an image produce important effects while mundane background regions do not contribute or even cause perceptual aliasing because of easy overlap. However, existing methods lack precisely modeling and full exploitation of these discriminative regions. In this paper, we propose the Focus on Local (FoL) approach to stimulate the performance of image retrieval and re-ranking in VPR simultaneously by mining and exploiting reliable discriminative local regions in images and introducing pseudo-correlation supervision. First, we design two losses, Extraction-Aggregation Spatial Alignment Loss (SAL) and Foreground-Background Contrast Enhancement Loss (CEL), to explicitly model reliable discriminative local regions and use them to guide the generation of global representations and efficient re-ranking. Second, we introduce a weakly-supervised local feature training strategy based on pseudo-correspondences obtained from aggregating global features to alleviate the lack of local correspondences ground truth for the VPR task. Third, we suggest an efficient re-ranking pipeline that is efficiently and precisely based on discriminative region guidance. Finally, experimental results show that our FoL achieves the state-of-the-art on multiple VPR benchmarks in both image retrieval and re-ranking stages and also significantly outperforms existing two-stage VPR methods in terms of computational efficiency. Code and models are available at https://github.com/chenshunpeng/FoL

88.1CVMay 12
Fast Image Super-Resolution via Consistency Rectified Flow

Jiaqi Xu, Wenbo Li, Haoze Sun et al.

Diffusion models (DMs) have demonstrated remarkable success in real-world image super-resolution (SR), yet their reliance on time-consuming multi-step sampling largely hinders their practical applications. While recent efforts have introduced few- or single-step solutions, existing methods either inefficiently model the process from noisy input or fail to fully exploit iterative generative priors, compromising the fidelity and quality of the reconstructed images. To address this issue, we propose FlowSR, a novel approach that reformulates the SR problem as a rectified flow from low-resolution (LR) to high-resolution (HR) images. Our method leverages an improved consistency learning strategy to enable high-quality SR in a single step. Specifically, we refine the original consistency distillation process by incorporating HR regularization, ensuring that the learned SR flow not only enforces self-consistency but also converges precisely to the ground-truth HR target. Furthermore, we introduce a fast-slow scheduling strategy, where adjacent timesteps for consistency learning are sampled from two distinct schedulers: a fast scheduler with fewer timesteps to improve efficiency, and a slow scheduler with more timesteps to capture fine-grained texture details. Extensive experiments demonstrate that FlowSR achieves outstanding performance in both efficiency and image quality.

CVJul 30, 2025Code
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning

Siqi Luo, Haoran Yang, Yi Xin et al.

Large pre-trained models achieve remarkable performance in vision tasks but are impractical for fine-tuning due to high computational and storage costs. Parameter-Efficient Fine-Tuning (PEFT) methods mitigate this issue by updating only a subset of parameters; however, most existing approaches are task-agnostic, failing to fully exploit task-specific adaptations, which leads to suboptimal efficiency and performance. To address this limitation, we propose Task-Relevant Parameter and Token Selection (TR-PTS), a task-driven framework that enhances both computational efficiency and accuracy. Specifically, we introduce Task-Relevant Parameter Selection, which utilizes the Fisher Information Matrix (FIM) to identify and fine-tune only the most informative parameters in a layer-wise manner, while keeping the remaining parameters frozen. Simultaneously, Task-Relevant Token Selection dynamically preserves the most informative tokens and merges redundant ones, reducing computational overhead. By jointly optimizing parameters and tokens, TR-PTS enables the model to concentrate on task-discriminative information. We evaluate TR-PTS on benchmark, including FGVC and VTAB-1k, where it achieves state-of-the-art performance, surpassing full fine-tuning by 3.40% and 10.35%, respectively. The code are available at https://github.com/synbol/TR-PTS.

LGDec 22, 2025
RP-CATE: Recurrent Perceptron-based Channel Attention Transformer Encoder for Industrial Hybrid Modeling

Haoran Yang, Yinan Zhang, Wenjie Zhang et al.

Nowadays, industrial hybrid modeling which integrates both mechanistic modeling and machine learning-based modeling techniques has attracted increasing interest from scholars due to its high accuracy, low computational cost, and satisfactory interpretability. Nevertheless, the existing industrial hybrid modeling methods still face two main limitations. First, current research has mainly focused on applying a single machine learning method to one specific task, failing to develop a comprehensive machine learning architecture suitable for modeling tasks, which limits their ability to effectively represent complex industrial scenarios. Second, industrial datasets often contain underlying associations (e.g., monotonicity or periodicity) that are not adequately exploited by current research, which can degrade model's predictive performance. To address these limitations, this paper proposes the Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE), with three distinctive characteristics: 1: We developed a novel architecture by replacing the self-attention mechanism with channel attention and incorporating our proposed Recurrent Perceptron (RP) Module into Transformer, achieving enhanced effectiveness for industrial modeling tasks compared to the original Transformer. 2: We proposed a new data type called Pseudo-Image Data (PID) tailored for channel attention requirements and developed a cyclic sliding window method for generating PID. 3: We introduced the concept of Pseudo-Sequential Data (PSD) and a method for converting industrial datasets into PSD, which enables the RP Module to capture the underlying associations within industrial dataset more effectively. An experiment aimed at hybrid modeling in chemical engineering was conducted by using RP-CATE and the experimental results demonstrate that RP-CATE achieves the best performance compared to other baseline models.

CLFeb 10, 2024
A Thorough Examination of Decoding Methods in the Era of LLMs

Chufan Shi, Haoran Yang, Deng Cai et al.

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.

17.7MAApr 8
Generating Local Shields for Decentralised Partially Observable Markov Decision Processes

Haoran Yang, Nobuko Yoshida

Multi-agent systems under partial observation often struggle to maintain safety because each agent's locally chosen action does not, in general, determine the resulting joint action. Shielding addresses this by filtering actions based on the current state, but most existing techniques either assume access to a shared centralised global state or employ memoryless local filters that cannot consider interaction history. We introduce a shield process algebra with guarded choice and recursion for specifying safe global behaviour in communication-free Dec-POMDP settings. From a shield process, we compile a process automaton, then a global Mealy machine as a safe joint-action filter, and finally project it to local Mealy machines whose states are belief-style subsets of the global Mealy machine states consistent with each agent's observations, and which output per-agent safe action sets. We implement the pipeline in Rust and integrate PRISM, the Probabilistic Symbolic Model Checker, to compute best- and worst-case safety probabilities independently of the agents' policies. A multi-agent path-finding case study demonstrates how different shield processes substantially reduce collisions compared to the unshielded baseline while exhibiting varying levels of expressiveness and conservatism.

ROAug 28, 2025
EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control

Delin Qu, Haoming Song, Qizhi Chen et al.

The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general-purpose embodied intelligent systems. Recent vision-language-action (VLA) models, which are co-trained on large-scale robot and visual-text data, have demonstrated notable progress in general robot control. However, they still fail to achieve human-level flexibility in interleaved reasoning and interaction. In this work, introduce EO-Robotics, consists of EO-1 model and EO-Data1.5M dataset. EO-1 is a unified embodied foundation model that achieves superior performance in multimodal embodied reasoning and robot control through interleaved vision-text-action pre-training. The development of EO-1 is based on two key pillars: (i) a unified architecture that processes multimodal inputs indiscriminately (image, text, video, and action), and (ii) a massive, high-quality multimodal embodied reasoning dataset, EO-Data1.5M, which contains over 1.5 million samples with emphasis on interleaved vision-text-action comprehension. EO-1 is trained through synergies between auto-regressive decoding and flow matching denoising on EO-Data1.5M, enabling seamless robot action generation and multimodal embodied reasoning. Extensive experiments demonstrate the effectiveness of interleaved vision-text-action learning for open-world understanding and generalization, validated through a variety of long-horizon, dexterous manipulation tasks across multiple embodiments. This paper details the architecture of EO-1, the data construction strategy of EO-Data1.5M, and the training methodology, offering valuable insights for developing advanced embodied foundation models.

MLOct 22, 2024
Error estimates between SGD with momentum and underdamped Langevin diffusion

Arnaud Guillin, Yu Wang, Lihu Xu et al.

Stochastic gradient descent with momentum is a popular variant of stochastic gradient descent, which has recently been reported to have a close relationship with the underdamped Langevin diffusion. In this paper, we establish a quantitative error estimate between them in the 1-Wasserstein and total variation distances.

LGDec 10, 2024
Fast Track to Winning Tickets: Repowering One-Shot Pruning for Graph Neural Networks

Yanwei Yue, Guibin Zhang, Haoran Yang et al.

Graph Neural Networks (GNNs) demonstrate superior performance in various graph learning tasks, yet their wider real-world application is hindered by the computational overhead when applied to large-scale graphs. To address the issue, the Graph Lottery Hypothesis (GLT) has been proposed, advocating the identification of subgraphs and subnetworks, \textit{i.e.}, winning tickets, without compromising performance. The effectiveness of current GLT methods largely stems from the use of iterative magnitude pruning (IMP), which offers higher stability and better performance than one-shot pruning. However, identifying GLTs is highly computationally expensive, due to the iterative pruning and retraining required by IMP. In this paper, we reevaluate the correlation between one-shot pruning and IMP: while one-shot tickets are suboptimal compared to IMP, they offer a \textit{fast track} to tickets with a stronger performance. We introduce a one-shot pruning and denoising framework to validate the efficacy of the \textit{fast track}. Compared to current IMP-based GLT methods, our framework achieves a double-win situation of graph lottery tickets with \textbf{higher sparsity} and \textbf{faster speeds}. Through extensive experiments across 4 backbones and 6 datasets, our method demonstrates $1.32\% - 45.62\%$ improvement in weight sparsity and a $7.49\% - 22.71\%$ increase in graph sparsity, along with a $1.7-44 \times$ speedup over IMP-based methods and $95.3\%-98.6\%$ MAC savings.

LGJun 2, 2024
IENE: Identifying and Extrapolating the Node Environment for Out-of-Distribution Generalization on Graphs

Haoran Yang, Xiaobing Pei, Kai Yuan

Due to the performance degradation of graph neural networks (GNNs) under distribution shifts, the work on out-of-distribution (OOD) generalization on graphs has received widespread attention. A novel perspective involves distinguishing potential confounding biases from different environments through environmental identification, enabling the model to escape environmentally-sensitive correlations and maintain stable performance under distribution shifts. However, in graph data, confounding factors not only affect the generation process of node features but also influence the complex interaction between nodes. We observe that neglecting either aspect of them will lead to a decrease in performance. In this paper, we propose IENE, an OOD generalization method on graphs based on node-level environmental identification and extrapolation techniques. It strengthens the model's ability to extract invariance from two granularities simultaneously, leading to improved generalization. Specifically, to identify invariance in features, we utilize the disentangled information bottleneck framework to achieve mutual promotion between node-level environmental estimation and invariant feature learning. Furthermore, we extrapolate topological environments through graph augmentation techniques to identify structural invariance. We implement the conceptual method with specific algorithms and provide theoretical analysis and proofs for our approach. Extensive experimental evaluations on two synthetic and four real-world OOD datasets validate the superiority of IENE, which outperforms existing techniques and provides a flexible framework for enhancing the generalization of GNNs.

CLMar 14, 2024
Unveiling the Generalization Power of Fine-Tuned Large Language Models

Haoran Yang, Yumeng Zhang, Jiaqi Xu et al.

While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities, fine-tuning these models on downstream, domain-specific datasets is often necessary to yield superior performance on test sets compared to their counterparts without fine-tuning. However, the comprehensive effects of fine-tuning on the LLMs' generalization ability are not fully understood. This paper delves into the differences between original, unmodified LLMs and their fine-tuned variants. Our primary investigation centers on whether fine-tuning affects the generalization ability intrinsic to LLMs. To elaborate on this, we conduct extensive experiments across five distinct language tasks on various datasets. Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. Intriguingly, we observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability. Through this systematic investigation, we aim to contribute valuable insights into the evolving landscape of fine-tuning practices for LLMs.

LGJan 28, 2024
Hyperedge Interaction-aware Hypergraph Neural Network

Rongping Ye, Xiaobing Pei, Haoran Yang et al.

Hypergraphs provide an effective modeling approach for modeling high-order relationships in many real-world datasets. To capture such complex relationships, several hypergraph neural networks have been proposed for learning hypergraph structure, which propagate information from nodes to hyperedges and then from hyperedges back to nodes. However, most existing methods focus on information propagation between hyperedges and nodes, neglecting the interactions among hyperedges themselves. In this paper, we propose HeIHNN, a hyperedge interaction-aware hypergraph neural network, which captures the interactions among hyperedges during the convolution process and introduce a novel mechanism to enhance information flow between hyperedges and nodes. Specifically, HeIHNN integrates the interactions between hyperedges into the hypergraph convolution by constructing a three-stage information propagation process. After propagating information from nodes to hyperedges, we introduce a hyperedge-level convolution to update the hyperedge embeddings. Finally, the embeddings that capture rich information from the interaction among hyperedges will be utilized to update the node embeddings. Additionally, we introduce a hyperedge outlier removal mechanism in the information propagation stages between nodes and hyperedges, which dynamically adjusts the hypergraph structure using the learned embeddings, effectively removing outliers. Extensive experiments conducted on real-world datasets show the competitive performance of HeIHNN compared with state-of-the-art methods.

LGDec 8, 2023
HC-Ref: Hierarchical Constrained Refinement for Robust Adversarial Training of GNNs

Xiaobing Pei, Haoran Yang, Gang Shen

Recent studies have shown that attackers can catastrophically reduce the performance of GNNs by maliciously modifying the graph structure or node features on the graph. Adversarial training, which has been shown to be one of the most effective defense mechanisms against adversarial attacks in computer vision, holds great promise for enhancing the robustness of GNNs. There is limited research on defending against attacks by performing adversarial training on graphs, and it is crucial to delve deeper into this approach to optimize its effectiveness. Therefore, based on robust adversarial training on graphs, we propose a hierarchical constraint refinement framework (HC-Ref) that enhances the anti-perturbation capabilities of GNNs and downstream classifiers separately, ultimately leading to improved robustness. We propose corresponding adversarial regularization terms that are conducive to adaptively narrowing the domain gap between the normal part and the perturbation part according to the characteristics of different layers, promoting the smoothness of the predicted distribution of both parts. Moreover, existing research on graph robust adversarial training primarily concentrates on training from the standpoint of node feature perturbations and seldom takes into account alterations in the graph structure. This limitation makes it challenging to prevent attacks based on topological changes in the graph. This paper generates adversarial examples by utilizing graph structure perturbations, offering an effective approach to defend against attack methods that are based on topological changes. Extensive experiments on two real-world graph benchmarks show that HC-Ref successfully resists various attacks and has better node classification performance compared to several baseline methods.

CLMay 22, 2023
A Frustratingly Simple Decoding Method for Neural Text Generation

Haoran Yang, Deng Cai, Huayang Li et al.

We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.

CLMay 11, 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Hongyuan Lu, Haoran Yang, Haoyang Huang et al.

Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x chrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.

LGFeb 17, 2022
Graph Masked Autoencoders with Transformers

Sixiao Zhang, Hongxu Chen, Haoran Yang et al.

Recently, transformers have shown promising performance in learning graph representations. However, there are still some challenges when applying transformers to real-world scenarios due to the fact that deep transformers are hard to train from scratch and the quadratic memory consumption w.r.t. the number of nodes. In this paper, we propose Graph Masked Autoencoders (GMAEs), a self-supervised transformer-based model for learning graph representations. To address the above two challenges, we adopt the masking mechanism and the asymmetric encoder-decoder design. Specifically, GMAE takes partially masked graphs as input, and reconstructs the features of the masked nodes. The encoder and decoder are asymmetric, where the encoder is a deep transformer and the decoder is a shallow transformer. The masking mechanism and the asymmetric design make GMAE a memory-efficient model compared with conventional transformers. We show that, when serving as a conventional self-supervised graph representation model, GMAE achieves state-of-the-art performance on both the graph classification task and the node classification task under common downstream evaluation protocols. We also show that, compared with training in an end-to-end manner from scratch, we can achieve comparable performance after pre-training and fine-tuning using GMAE while simplifying the training process.

LGJan 19, 2022
Dual Space Graph Contrastive Learning

Haoran Yang, Hongxu Chen, Shirui Pan et al.

Unsupervised graph representation learning has emerged as a powerful tool to address real-world problems and achieves huge success in the graph learning domain. Graph contrastive learning is one of the unsupervised graph representation learning methods, which recently attracts attention from researchers and has achieved state-of-the-art performances on various tasks. The key to the success of graph contrastive learning is to construct proper contrasting pairs to acquire the underlying structural semantics of the graph. However, this key part is not fully explored currently, most of the ways generating contrasting pairs focus on augmenting or perturbating graph structures to obtain different views of the input graph. But such strategies could degrade the performances via adding noise into the graph, which may narrow down the field of the applications of graph contrastive learning. In this paper, we propose a novel graph contrastive learning method, namely \textbf{D}ual \textbf{S}pace \textbf{G}raph \textbf{C}ontrastive (DSGC) Learning, to conduct graph contrastive learning among views generated in different spaces including the hyperbolic space and the Euclidean space. Since both spaces have their own advantages to represent graph data in the embedding spaces, we hope to utilize graph contrastive learning to bridge the spaces and leverage advantages from both sides. The comparison experiment results show that DSGC achieves competitive or better performances among all the datasets. In addition, we conduct extensive experiments to analyze the impact of different graph encoders on DSGC, giving insights about how to better leverage the advantages of contrastive learning between different spaces.

IRSep 7, 2021
Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation

Haoran Yang, Hongxu Chen, Lin Li et al.

User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems. Various methods have been proposed to address it via leveraging the advantages of graph neural networks (GNNs) or multi-task learning. However, most existing works do not take the complex dependencies among different behaviors of users into consideration. They utilize simple and fixed schemes, like neighborhood information aggregation or mathematical calculation of vectors, to fuse the embeddings of different user behaviors to obtain a unified embedding to represent a user's behavioral patterns which will be used in downstream recommendation tasks. To tackle the challenge, in this paper, we first propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user. How to obtain a unified embedding for a user from hyper meta-paths and avoid the previously mentioned limitations simultaneously is critical. Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors. A new graph contrastive learning based framework is proposed by coupling with hyper meta-paths, namely HMG-CR, which consistently and significantly outperforms all baselines in extensive comparison experiments.

CLSep 3, 2021
Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation

Haoran Yang, Wai Lam, Piji Li

Exemplar-Guided Paraphrase Generation (EGPG) aims to generate a target sentence which conforms to the style of the given exemplar while encapsulating the content information of the source sentence. In this paper, we propose a new method with the goal of learning a better representation of the style andthe content. This method is mainly motivated by the recent success of contrastive learning which has demonstrated its power in unsupervised feature extraction tasks. The idea is to design two contrastive losses with respect to the content and the style by considering two problem characteristics during training. One characteristic is that the target sentence shares the same content with the source sentence, and the second characteristic is that the target sentence shares the same style with the exemplar. These two contrastive losses are incorporated into the general encoder-decoder paradigm. Experiments on two datasets, namely QQP-Pos and ParaNMT, demonstrate the effectiveness of our proposed constrastive losses.

CLAug 29, 2021
Sentence Structure and Word Relationship Modeling for Emphasis Selection

Haoran Yang, Wai Lam

Emphasis Selection is a newly proposed task which focuses on choosing words for emphasis in short sentences. Traditional methods only consider the sequence information of a sentence while ignoring the rich sentence structure and word relationship information. In this paper, we propose a new framework that considers sentence structure via a sentence structure graph and word relationship via a word similarity graph. The sentence structure graph is derived from the parse tree of a sentence. The word similarity graph allows nodes to share information with their neighbors since we argue that in emphasis selection, similar words are more likely to be emphasized together. Graph neural networks are employed to learn the representation of each node of these two graphs. Experimental results demonstrate that our framework can achieve superior performance.

CVApr 20, 2021
Semantic Segmentation by Improved Generative Adversarial Networks

ZengShun Zhaoa, Yulong Wang, Ke Liu et al.

While most existing segmentation methods usually combined the powerful feature extraction capabilities of CNNs with Conditional Random Fields (CRFs) post-processing, the result always limited by the fault of CRFs . Due to the notoriously slow calculation speeds and poor efficiency of CRFs, in recent years, CRFs post-processing has been gradually eliminated. In this paper, an improved Generative Adversarial Networks (GANs) for image semantic segmentation task (semantic segmentation by GANs, Seg-GAN) is proposed to facilitate further segmentation research. In addition, we introduce Convolutional CRFs (ConvCRFs) as an effective improvement solution for the image semantic segmentation task. Towards the goal of differentiating the segmentation results from the ground truth distribution and improving the details of the output images, the proposed discriminator network is specially designed in a full convolutional manner combined with cascaded ConvCRFs. Besides, the adversarial loss aggressively encourages the output image to be close to the distribution of the ground truth. Our method not only learns an end-to-end mapping from input image to corresponding output image, but also learns a loss function to train this mapping. The experiments show that our method achieves better performance than state-of-the-art methods.

CLFeb 6, 2020
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation

Minghong Xu, Piji Li, Haoran Yang et al.

Unstructured Persona-oriented Dialogue Systems (UPDS) has been demonstrated effective in generating persona consistent responses by utilizing predefined natural language user persona descriptions (e.g., "I am a vegan"). However, the predefined user persona descriptions are usually short and limited to only a few descriptive words, which makes it hard to correlate them with the dialogues. As a result, existing methods either fail to use the persona description or use them improperly when generating persona consistent responses. To address this, we propose a neural topical expansion framework, namely Persona Exploration and Exploitation (PEE), which is able to extend the predefined user persona description with semantically correlated content before utilizing them to generate dialogue responses. PEE consists of two main modules: persona exploration and persona exploitation. The former learns to extend the predefined user persona description by mining and correlating with existing dialogue corpus using a variational auto-encoder (VAE) based topic model. The latter learns to generate persona consistent responses by utilizing the predefined and extended user persona description. In order to make persona exploitation learn to utilize user persona description more properly, we also introduce two persona-oriented loss functions: Persona-oriented Matching (P-Match) loss and Persona-oriented Bag-of-Words (P-BoWs) loss which respectively supervise persona selection in encoder and decoder. Experimental results show that our approach outperforms state-of-the-art baselines, in terms of both automatic and human evaluations.