Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder
This work addresses the challenge of generating structured code from GUI images for developers and designers, offering an incremental improvement over existing encoder-decoder frameworks.
The paper tackles the problem of automatically generating graphics program code from GUI screenshots by proposing an attention-based hierarchical decoder that improves code generation accuracy and consistency with GUI layout. Experimental results show the method outperforms current state-of-the-art methods on both public and custom datasets.
Recent progress on deep learning has made it possible to automatically transform the screenshot of Graphic User Interface (GUI) into code by using the encoder-decoder framework. While the commonly adopted image encoder (e.g., CNN network), might be capable of extracting image features to the desired level, interpreting these abstract image features into hundreds of tokens of code puts a particular challenge on the decoding power of the RNN-based code generator. Considering the code used for describing GUI is usually hierarchically structured, we propose a new attention-based hierarchical code generation model, which can describe GUI images in a finer level of details, while also being able to generate hierarchically structured code in consistency with the hierarchical layout of the graphic elements in the GUI. Our model follows the encoder-decoder framework, all the components of which can be trained jointly in an end-to-end manner. The experimental results show that our method outperforms other current state-of-the-art methods on both a publicly available GUI-code dataset as well as a dataset established by our own.