Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer
This addresses the time-consuming task of writing pseudo-code for program comprehension, but it is incremental as it builds on existing sequence-to-sequence and code semantic learning methods.
The paper tackles the problem of generating pseudo-code from source code to aid novice developers, proposing DeepPseudo, a method using code feature extraction and Transformer, and shows it is competitive with state-of-the-art baselines on Django and SPoC corpora.
Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and code semantic learning, we propose a novel deep pseudo-code generation method DeepPseudo via code feature extraction and Transformer. In particular, DeepPseudo utilizes a Transformer encoder to perform encoding for source code and then use a code feature extractor to learn the knowledge of local features. Finally, it uses a pseudo-code generator to perform decoding, which can generate the corresponding pseudo-code. We choose two corpora (i.e., Django and SPoC) from real-world large-scale projects as our empirical subjects. We first compare DeepPseudo with seven state-of-the-art baselines from pseudo-code generation and neural machine translation domains in terms of four performance measures. Results show the competitiveness of DeepPseudo. Moreover, we also analyze the rationality of the component settings in DeepPseudo.