SELGPLMay 25, 2023

Type Prediction With Program Decomposition and Fill-in-the-Type Training

arXiv:2305.17145v17 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the tedious task of introducing and maintaining type annotations for programmers, though it is incremental as it builds on existing LLM methods.

The authors tackled the problem of automated type prediction for TypeScript and Python by developing OpenTau, a search-based approach using large language models, resulting in 47.4% of files type-checking with a 14.5% absolute improvement and 3.3 type errors per file.

TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available at: https://github.com/GammaTauAI/opentau.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes