Study of Distractors in Neural Models of Code
This work addresses the need for enhanced transparency in neural models of code, providing a complementary view to feature relevance, though it appears incremental as it applies a reduction-based technique to a specific domain.
The paper tackles the problem of identifying distractor features that reduce neural model confidence in code-related predictions, finding that token removal can significantly impact model confidence and token categories play a vital role.
Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models.