LGSEJun 22, 2021

On Adversarial Robustness of Synthetic Code Generation

arXiv:2106.11629v16 citations
Originality Incremental advance
AI Analysis

This work addresses adversarial vulnerabilities in code generation systems, which is an incremental improvement for developers and researchers in automated programming.

The paper tackles the problem of adversarial robustness in synthetic code generation for domain-specific languages, showing that Transformer-based models outperform existing baselines but perform poorly under adversarial settings, and proposes dataset augmentation techniques that reduce bias with demonstrated efficacy.

Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing \textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes