Novice Type Error Diagnosis with Natural Language Models
This work addresses the challenge of type error diagnosis for novice programmers, representing an incremental improvement over existing data-driven methods.
The paper tackled the problem of diagnosing type errors in novice programmers' code by using natural language models for localization, achieving a 62% accuracy rate, which is 11% higher than the previous state-of-the-art data-driven model.
Strong static type systems help programmers eliminate many errors without much burden of supplying type annotations. However, this flexibility makes it highly non-trivial to diagnose ill-typed programs, especially for novice programmers. Compared to classic constraint solving and optimization-based approaches, the data-driven approach has shown great promise in identifying the root causes of type errors with higher accuracy. Instead of relying on hand-engineered features, this work explores natural language models for type error localization, which can be trained in an end-to-end fashion without requiring any features. We demonstrate that, for novice type error diagnosis, the language model-based approach significantly outperforms the previous state-of-the-art data-driven approach. Specifically, our model could predict type errors correctly 62% of the time, outperforming the state-of-the-art Nate's data-driven model by 11%, in a more rigorous accuracy metric. Furthermore, we also apply structural probes to explain the performance difference between different language models.