PLLGOct 7, 2022

Novice Type Error Diagnosis with Natural Language Models

arXiv:2210.03682v14 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the challenge of type error diagnosis for novice programmers, representing an incremental improvement over existing data-driven methods.

The paper tackled the problem of diagnosing type errors in novice programmers' code by using natural language models for localization, achieving a 62% accuracy rate, which is 11% higher than the previous state-of-the-art data-driven model.

Strong static type systems help programmers eliminate many errors without much burden of supplying type annotations. However, this flexibility makes it highly non-trivial to diagnose ill-typed programs, especially for novice programmers. Compared to classic constraint solving and optimization-based approaches, the data-driven approach has shown great promise in identifying the root causes of type errors with higher accuracy. Instead of relying on hand-engineered features, this work explores natural language models for type error localization, which can be trained in an end-to-end fashion without requiring any features. We demonstrate that, for novice type error diagnosis, the language model-based approach significantly outperforms the previous state-of-the-art data-driven approach. Specifically, our model could predict type errors correctly 62% of the time, outperforming the state-of-the-art Nate's data-driven model by 11%, in a more rigorous accuracy metric. Furthermore, we also apply structural probes to explain the performance difference between different language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes