AIFeb 11, 2025

URECA: The Chain of Two Minimum Set Cover Problems exists behind Adaptation to Shifts in Semantic Code Search

arXiv:2502.07494v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses adaptation challenges in semantic code search, offering a novel method that improves performance for few-shot adaptation to diverse shifts, though it appears incremental as it builds on existing clustering and adaptation frameworks.

The paper tackles the problem of model adaptation to distribution shifts in semantic code search by identifying a limitation in the minimum entropy formulation, which leads to shifted initialization cascade, and introduces URECA, a new clustering algorithm that achieves state-of-the-art performance in CoSQA under query shift scenarios.

Adaptation is to make model learn the patterns shifted from the training distribution. In general, this adaptation is formulated as the minimum entropy problem. However, the minimum entropy problem has inherent limitation -- shifted initialization cascade phenomenon. We extend the relationship between the minimum entropy problem and the minimum set cover problem via Lebesgue integral. This extension reveals that internal mechanism of the minimum entropy problem ignores the relationship between disentangled representations, which leads to shifted initialization cascade. From the analysis, we introduce a new clustering algorithm, Union-find based Recursive Clustering Algorithm~(URECA). URECA is an efficient clustering algorithm for the leverage of the relationships between disentangled representations. The update rule of URECA depends on Thresholdly-Updatable Stationary Assumption to dynamics as a released version of Stationary Assumption. This assumption helps URECA to transport disentangled representations with no errors based on the relationships between disentangled representations. URECA also utilize simulation trick to efficiently cluster disentangled representations. The wide range of evaluations show that URECA achieves consistent performance gains for the few-shot adaptation to diverse types of shifts along with advancement to State-of-The-Art performance in CoSQA in the scenario of query shift.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes