LGCRPLAug 7, 2023

Exploiting Code Symmetries for Learning Program Semantics

UW
arXiv:2308.03312v917 citationsh-index: 47
Originality Highly original
AI Analysis

This work addresses the problem of improving program analysis for developers and researchers by introducing a novel method that leverages code symmetries, representing a significant advancement rather than an incremental improvement.

The paper tackles the challenge of teaching code semantics to Large Language Models for program analysis by incorporating code symmetries into the model architecture, resulting in SymC, which outperforms state-of-the-art code models on five program analysis tasks without pre-training.

This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes