SELGJul 10, 2021

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search

arXiv:2107.04773v22 citations
AI Analysis

This work addresses the problem of improving semantic code search for developers by providing a more accurate and comprehensive model, though it appears incremental as it builds on existing ensemble and data augmentation techniques.

The paper tackles the challenge of semantic code search by addressing the difficulty of a single model capturing diverse code perspectives and user intents, proposing MuCoS, a multi-model ensemble learning architecture that combines individual learners trained on different datasets to achieve comprehensive feature capture.

Recently, deep learning methods have become mainstream in code search since they do better at capturing semantic correlations between code snippets and search queries and have promising performance. However, code snippets have diverse information from different dimensions, such as business logic, specific algorithm, and hardware communication, so it is hard for a single code representation module to cover all the perspectives. On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents. In this paper, we propose MuCoS, a multi-model ensemble learning architecture for semantic code search. It combines several individual learners, each of which emphasizes a specific perspective of code snippets. We train the individual learners on different datasets which contain different perspectives of code information, and we use a data augmentation strategy to get these different datasets. Then we ensemble the learners to capture comprehensive features of code snippets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes