Predicting Variable Types in Dynamically Typed Programming Languages
This addresses the challenge of type inference for programmers and virtual machines in dynamic languages, representing a novel method for a known bottleneck.
The paper tackles the problem of predicting variable and function return types in dynamically typed programming languages to improve code understanding and optimization. Their best model achieves 44.33% accuracy across 21 classes and a top-3 accuracy of 71.5% on a Python dataset.
Dynamic Programming Languages are quite popular because they increase the programmer's productivity. However, the absence of types in the source code makes the program written in these languages difficult to understand and virtual machines that execute these programs cannot produced optimized code. To overcome this challenge, we develop a technique to predict types of all identifiers including variables, and function return types. We propose the first implementation of $2^{nd}$ order Inside Outside Recursive Neural Networks with two variants (i) Child-Sum Tree-LSTMs and (ii) N-ary RNNs that can handle large number of tree branching. We predict the types of all the identifiers given the Abstract Syntax Tree by performing just two passes over the tree, bottom-up and top-down, keeping both the content and context representation for all the nodes of the tree. This allows these representations to interact by combining different paths from the parent, siblings and children which is crucial for predicting types. Our best model achieves 44.33\% across 21 classes and top-3 accuracy of 71.5\% on our gathered Python data set from popular Python benchmarks.