CLNov 26, 2024

Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search

arXiv:2411.17538v21 citationsh-index: 29ESANN 2025 proceesdings
Originality Incremental advance
AI Analysis

This addresses performance issues in semantic code search for developers, but is incremental as it builds on existing whitening methods.

The study tackled the problem of low isotropy in embedding spaces impairing semantic code search by proposing a modified ZCA whitening technique, which improved performance of pre-trained code language models and complemented contrastive fine-tuning.

Low isotropy in an embedding space impairs performance on tasks involving semantic inference. Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes