SELGAug 26, 2025

Stack Trace-Based Crash Deduplication with Transformer Adaptation

arXiv:2508.19449v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the issue of reducing manual triage effort for developers in software engineering by providing an effective crash deduplication solution, though it is incremental as it adapts existing NLP techniques to a specific domain.

The paper tackled the problem of duplicate crash reports overwhelming issue-tracking systems by proposing dedupT, a transformer-based approach that models stack traces holistically, resulting in improvements such as over 15% in Mean Reciprocal Rank compared to the best deep learning baseline and up to 9% over traditional methods.

Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods, relying on string similarity, rule-based heuristics, or deep learning (DL) models, often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transformer-based approach that models stack traces holistically rather than as isolated frames. dedupT first adapts a pretrained language model (PLM) to stack traces, then uses its embeddings to train a fully-connected network (FCN) to rank duplicate crashes effectively. Extensive experiments on real-world datasets show that dedupT outperforms existing DL and traditional methods (e.g., sequence alignment and information retrieval techniques) in both duplicate ranking and unique crash detection, significantly reducing manual triage effort. On four public datasets, dedupT improves Mean Reciprocal Rank (MRR) often by over 15% compared to the best DL baseline and up to 9% over traditional methods while achieving higher Receiver Operating Characteristic Area Under the Curve (ROC-AUC) in detecting unique crash reports. Our work advances the integration of modern natural language processing (NLP) techniques into software engineering, providing an effective solution for stack trace-based crash deduplication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes