Zero-shot Evaluation of Deep Learning for Java Code Clone Detection
For researchers and practitioners in code clone detection, this work highlights the limited generalizability of DL-based detectors and the surprising effectiveness of a conventional tool in zero-shot settings.
This paper evaluates five state-of-the-art deep learning-based Java code clone detectors in a zero-shot scenario, finding limited generalizability to unseen code and that the conventional tool NiCad outperforms them.
Deep Learning (DL) is becoming more and more widespread in clone detection, motivated by achieving near-perfect performance for this task. In particular in case of semantic code clones, which share only limited syntax but implement the same or similar functionality, Deep Learning appears to outperform conventional tools. In this paper, we want to investigate the generalizability of DL-based clone detectors for Java. We therefore replicate and evaluate the performance of five state-of-the-art DL-based clone detectors, including Transformers like CodeBERT and single-task models like FA-AST+GMN, in a zero-shot evaluation scenario, where we train/fine-tune and evaluate on different datasets and functionalities. Our experiments demonstrate that the models' generalizability to unseen code is limited. Further analysis reveals that the conventional clone detector NiCad even outperforms the DL-based clone detectors in such a zero-shot evaluation scenario.