SE PLDec 4, 2020

Quality Estimation & Interpretability for Code Translation

Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, Michael Muller, John Richards, Steven Ross, Justin D. Weisz

arXiv:2012.07581v28.99 citations

Originality Incremental advance

AI Analysis

This work is significant for developers and researchers using code translation tools, as it provides a way to assess the reliability of automatically translated code, which is an incremental improvement.

This paper addresses the lack of quality estimation and interpretability in neural machine translation (NMT) for source code. The authors correlate model confidences from the TransCoder model with lint errors in translated code, providing a method to estimate translation quality.

Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the translations; and consequently ascribe some measure of interpretability to the model's choices. In this paper, we attempt to estimate the quality of source code translations built on top of the TransCoder model. We consider the code translation task as an analog of machine translation (MT) for natural languages, with some added caveats. We present our main motivation from a user study built around code translation; and present a technique that correlates the confidences generated by that model to lint errors in the translated code. We conclude with some observations on these correlations, and some ideas for future work.

View on arXiv PDF

Similar