SE AINov 10, 2021

Data-Driven AI Model Signal-Awareness Enhancement and Introspection

Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Jim Laredo, Alessandro Morari

arXiv:2111.05827v26.41 citations

Originality Incremental advance

AI Analysis

This addresses reliability concerns in AI-for-code models for software engineering applications, offering a solution to enhance signal awareness, though it appears incremental as it builds on existing probing and debugging techniques.

The paper tackled the problem of AI models for source code understanding lacking signal awareness, and achieved up to 4.8x improvement in model signal awareness by combining code complexity with curriculum learning and using Delta Debugging to generate simplified programs for training data augmentation.

AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model learning introspection approach from the perspective of the dataset.

View on arXiv PDF

Similar