SEAIMar 18, 2022

M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

arXiv:2203.09707v250 citationsh-index: 8Has Code
AI Analysis

This work addresses a challenging issue in software engineering by improving code summarization for developers, though it appears incremental as it builds on existing multi-modal methods.

The paper tackles the problem of source code summarization by proposing M2TS, a multi-scale multi-modal Transformer-based approach that extracts structural features from Abstract Syntax Trees (ASTs) and combines them with token features, achieving state-of-the-art performance on Java and Python datasets.

Source code summarization aims to generate natural language descriptions of code snippets. Many existing studies learn the syntactic and semantic knowledge of code snippets from their token sequences and Abstract Syntax Trees (ASTs). They use the learned code representations as input to code summarization models, which can accordingly generate summaries describing source code. Traditional models traverse ASTs as sequences or split ASTs into paths as input. However, the former loses the structural properties of ASTs, and the latter destroys the overall structure of ASTs. Therefore, comprehensively capturing the structural features of ASTs in learning code representations for source code summarization remains a challenging problem to be solved. In this paper, we propose M2TS, a Multi-scale Multi-modal approach based on Transformer for source code Summarization. M2TS uses a multi-scale AST feature extraction method, which can extract the structures of ASTs more completely and accurately at multiple local and global levels. To complement missing semantic information in ASTs, we also obtain code token features, and further combine them with the extracted AST features using a cross modality fusion method that not only fuses the syntactic and contextual semantic information of source code, but also highlights the key features of each modality. We conduct experiments on two Java and one Python datasets, and the experimental results demonstrate that M2TS outperforms current state-of-the-art methods. We release our code at https://github.com/TranSMS/M2TS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes