SEAIAug 19, 2022

Topical: Learning Repository Embeddings from Source Code using Attention

arXiv:2208.09495v42 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of repository-level representation for developers and researchers, offering a scalable and efficient method, though it is incremental as it builds on existing embedding techniques with a novel attention-based aggregation.

The paper tackles the problem of generating repository-level embeddings from source code by introducing Topical, a deep neural network that uses an attention mechanism to combine source code, dependency graphs, and textual data, outperforming existing methods in tasks like repository auto-tagging.

This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes