SE AI IRMay 19, 2025

CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming

Han Deng, Yuan Meng, Shixiang Tang, Wanli Ouyang, Xinzhu Ma

arXiv:2505.12925v28.03 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses a critical problem for competitive programming organizers and AI researchers by providing tools to detect similar problems, though it is incremental as it builds on existing retrieval methods.

The paper tackles the issue of duplicate or similar problems in competitive programming benchmarks, which affects fairness and model evaluation validity, by introducing CPRet, a dataset and benchmark for similar question retrieval, and developing specialized retrievers that achieve strong results, with analysis showing high-similarity problems inflate model pass rates by up to 15%.

Competitive programming benchmarks are widely used in scenarios such as programming contests and large language model assessments. However, the growing presence of duplicate or highly similar problems raises concerns not only about competition fairness, but also about the validity of competitive programming as a benchmark for model evaluation. In this paper, we propose a new problem, similar question retrieval, to tackle this issue. Due to the lack of both data and models, solving this problem is challenging. To this end, we introduce CPRet, a retrieval-oriented benchmark suite for competitive programming, covering four retrieval tasks: two code-centric (i.e., Text-to-Code, Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate, Simplified-to-Full) built from a combination of automatically crawled problem-solution data and manually curated annotations. Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation. Besides, we further develop two task-specialized retrievers based on this dataset: CPRetriever-Code, trained with a novel Group-InfoNCE loss for problem-code alignment, and CPRetriever-Prob, fine-tuned for identifying problem-level similarity. Both models achieve strong results and are open-sourced for local use. Finally, we analyze LiveCodeBench and find that high-similarity problems inflate model pass rates and reduce differentiation, underscoring the need for similarity-aware evaluation in future benchmarks. Github: https://github.com/coldchair/CPRet Online Demo: https://www.cpret.online/

View on arXiv PDF Code

Similar