HGAdapter: Hypergraph-based Adapters in Language Models for Code Summarization and Clone Detection
This work addresses code-related tasks like summarization and clone detection for developers and researchers, offering an incremental improvement by enhancing PLMs with high-order correlations.
The authors tackled the problem of pre-trained language models (PLMs) not capturing high-order data correlations in code by proposing HGAdapter, a hypergraph-based adapter that encodes three types of high-order correlations (abstract syntax tree family, lexical, and line) to fine-tune PLMs for code summarization and clone detection, resulting in improved performance across six languages and multiple datasets.
Pre-trained language models (PLMs) are increasingly being applied to code-related tasks. Although PLMs have achieved good results, they do not take into account potential high-order data correlations within the code. We propose three types of high-order correlations in code tokens, i.e. abstract syntax tree family correlation, lexical correlation, and line correlation. We design a tokens and hyperedges generator to capture these high-order data correlations. We improve the architecture of hypergraph neural networks and combine it with adapter tuning to propose a novel hypergraph-based adapter (HGAdapter) to fine-tune PLMs. HGAdapter can encode high-order data correlations and is allowed to be inserted into various PLMs to enhance performance. Experiments were conducted on several public datasets, including six languages of code summarization and code clone detection tasks. Our methods improved the performance of PLMs in datasets to varying degrees. Experimental results validate the introduction of high-order data correlations that contribute to improved effectiveness.