LGJul 26, 2025

Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection

Adelaide Danilov, Aria Nourbakhsh, Christoph Schommer

arXiv:2507.20078v1h-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of detecting equivalent mutants in code, which is crucial for software testing and quality assurance, representing a domain-specific incremental improvement.

The paper tackled the problem of equivalent code mutant detection by introducing Cluster Purge Loss, a framework that integrates cross-entropy with deep metric learning to structure transformer embeddings, achieving state-of-the-art performance in this domain.

Recent pre-trained transformer models achieve superior performance in various code processing objectives. However, although effective at optimizing decision boundaries, common approaches for fine-tuning them for downstream classification tasks - distance-based methods or training an additional classification head - often fail to thoroughly structure the embedding space to reflect nuanced intra-class semantic relationships. Equivalent code mutant detection is one of these tasks, where the quality of the embedding space is crucial to the performance of the models. We introduce a novel framework that integrates cross-entropy loss with a deep metric learning objective, termed Cluster Purge Loss. This objective, unlike conventional approaches, concentrates on adjusting fine-grained differences within each class, encouraging the separation of instances based on semantical equivalency to the class center using dynamically adjusted borders. Employing UniXCoder as the base model, our approach demonstrates state-of-the-art performance in the domain of equivalent mutant detection and produces a more interpretable embedding space.

View on arXiv PDF

Similar