SEApr 15

A Universal Textual Merge Strategy Based on Tokens for Version Control Systems

arXiv:2604.1381311.4h-index: 4
AI Analysis

For developers using version control systems, Summer reduces spurious merge conflicts without requiring language-specific parsers, offering a universal solution for heterogeneous artifacts.

Summer, a token-based merge algorithm, achieves 36% accuracy in reproducing developer merges verbatim, outperforming five existing tools on a large benchmark of real-world merge scenarios.

Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes