SECLNov 13, 2023

AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection

arXiv:2311.07277v29 citationsh-index: 11
Originality Highly original
AI Analysis

This addresses the limitation of current methods to few popular languages due to insufficient annotated data, enabling clone detection in diverse programming languages for software development.

The paper tackles the problem of code clone detection across multiple programming languages by proposing AdaCCD, a cross-lingual adaptation method that transfers knowledge from resource-rich to resource-poor languages without annotations, achieving comparable performance to supervised fine-tuning on a benchmark of 5 languages.

Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and achieve comparable performance to supervised fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes