SEIRPLJan 10, 2022

Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning

arXiv:2201.03346v28 citations
AI Analysis

This work addresses the need for better software engineering tools by integrating high-level domain concepts, though it appears incremental as it builds on existing models with a simple joint-learning approach.

The authors tackled the problem of improving code modeling by augmenting a pretrained language model with concept graphs using multi-modal learning, resulting in preliminary gains in code search effectiveness.

The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approaches based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes