CLJun 8, 2021

Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

arXiv:2106.04447v1715 citationsHas Code
AI Analysis

This work addresses code generation for developers by improving accuracy with additional context, though it is incremental as it builds on existing datasets and models.

The paper tackled the problem of generating code from programming questions by using both the question title and body text from StackOverflow, achieving a BLEU score of 35.32, which beats the prior state-of-the-art by 71.96%.

Answering a programming question using only its title is difficult as salient contextual information is omitted. Based on this observation, we present a corpus of over 40,000 StackOverflow question texts to be used in conjunction with their corresponding intents from the CoNaLa dataset (Yin et al., 2018). Using both the intent and question body, we use BART to establish a baseline BLEU score of 34.35 for this new task. We find further improvements of $2.8\%$ by combining the mined CoNaLa data with the labeled data to achieve a 35.32 BLEU score. We evaluate prior state-of-the-art CoNaLa models with this additional data and find that our proposed method of using the body and mined data beats the BLEU score of the prior state-of-the-art by $71.96\%$. Finally, we perform ablations to demonstrate that BART is an unsupervised multimodal learner and examine its extractive behavior. The code and data can be found https://github.com/gabeorlanski/stackoverflow-encourages-cheating.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes