SEApr 4, 2024Code
CodeEditorBench: Evaluating Code Editing Capability of Large Language ModelsJiawei Guo, Ziming Li, Xueling Liu et al.
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluation of 19 LLMs reveals that closed-source models (particularly Gemini-Ultra and GPT-4), outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. By introducing CodeEditorBench, we contribute to the advancement of LLMs in code editing and provide a valuable resource for researchers and practitioners.
CVJul 2, 2025Code
Active Control Points-based 6DoF Pose Tracking for Industrial Metal ObjectsChentao Shen, Ding Pan, Mingyu Mei et al.
Visual pose tracking is playing an increasingly vital role in industrial contexts in recent years. However, the pose tracking for industrial metal objects remains a challenging task especially in the real world-environments, due to the reflection characteristic of metal objects. To address this issue, we propose a novel 6DoF pose tracking method based on active control points. The method uses image control points to generate edge feature for optimization actively instead of 6DoF pose-based rendering, and serve them as optimization variables. We also introduce an optimal control point regression method to improve robustness. The proposed tracking method performs effectively in both dataset evaluation and real world tasks, providing a viable solution for real-time tracking of industrial metal objects. Our source code is made publicly available at: https://github.com/tomatoma00/ACPTracking.
CLApr 7, 2025Code
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human ValuesM-A-P Team, Siwei Wu, Jincheng Ren et al.
Aligning large language models (LLMs) with human preferences has achieved remarkable success. However, existing Chinese preference datasets are limited by small scale, narrow domain coverage, and lack of rigorous data validation. Additionally, the reliance on human annotators for instruction and response labeling significantly constrains the scalability of human preference datasets. To address these challenges, we design an LLM-based Chinese preference dataset annotation pipeline with no human intervention. Specifically, we crawled and carefully filtered 92k high-quality Chinese queries and employed 15 mainstream LLMs to generate and score chosen-rejected response pairs. Based on it, we introduce COIG-P (Chinese Open Instruction Generalist - Preference), a high-quality, large-scale Chinese preference dataset, comprises 1,009k Chinese preference pairs spanning 6 diverse domains: Chat, Code, Math, Logic, Novel, and Role. Building upon COIG-P, to reduce the overhead of using LLMs for scoring, we trained a 8B-sized Chinese Reward Model (CRM) and meticulously constructed a Chinese Reward Benchmark (CRBench). Evaluation results based on AlignBench \citep{liu2024alignbenchbenchmarkingchinesealignment} show that that COIG-P significantly outperforms other Chinese preference datasets, and it brings significant performance improvements ranging from 2% to 12% for the Qwen2/2.5 and Infinity-Instruct-3M-0625 model series, respectively. The results on CRBench demonstrate that our CRM has a strong and robust scoring ability. We apply it to filter chosen-rejected response pairs in a test split of COIG-P, and our experiments show that it is comparable to GPT-4o in identifying low-quality samples while maintaining efficiency and cost-effectiveness. Our codes and data are released in https://github.com/multimodal-art-projection/COIG-P.
IVDec 10, 2021Code
Surrogate-based cross-correlation for particle image velocimetryYong Lee, Fuqiang Gu, Zeyu Gong et al.
This paper presents a novel surrogate-based cross-correlation (SBCC) framework to improve the correlation performance for practical particle image velocimetry~(PIV). The basic idea is that an optimized surrogate filter/image, replacing one raw image, will produce a more accurate and robust correlation signal. Specifically, the surrogate image is encouraged to generate perfect Gaussian-shaped correlation map to tracking particles (PIV image pair) while producing zero responses to image noise (context images). And the problem is formularized with an objective function composed of surrogate loss and consistency loss. As a result, the closed-form solution provides an efficient multivariate operator that could consider other negative context images. Compared with the state-of-the-art baseline methods (background subtraction, robust phase correlation, etc.), our SBCC method exhibits significant performance improvement (accuracy and robustness) on the synthetic dataset and several challenging experimental PIV cases. Besides, our implementation with experimental details (\url{https://github.com/yongleex/SBCC}) is also available for interested researchers.
CLApr 5, 2024
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language ModelXinrun Du, Zhouliang Yu, Songyang Gao et al.
In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion English tokens, and 100 billion code tokens. This strategic composition facilitates the model's exceptional proficiency in understanding and processing Chinese, a capability further enhanced through alignment techniques. Demonstrating remarkable performance on the CHC-Bench, CT-LLM excels in Chinese language tasks, and showcases its adeptness in English through SFT. This research challenges the prevailing paradigm of training LLMs predominantly on English corpora and then adapting them to other languages, broadening the horizons for LLM training methodologies. By open-sourcing the full process of training a Chinese LLM, including a detailed data processing procedure with the obtained Massive Appropriate Pretraining Chinese Corpus (MAP-CC), a well-chosen multidisciplinary Chinese Hard Case Benchmark (CHC-Bench), and the 2B-size Chinese Tiny LLM (CT-LLM), we aim to foster further exploration and innovation in both academia and industry, paving the way for more inclusive and versatile language models.
9.8COMP-PHApr 10
Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning modelsShuo-Hui Li, Chen Chen, Yao-Wen Zhang et al.
Rare events are central to the evolution of complex many-body systems, characterized as key transitional configurations on the free energy surface (FES). Conventional methods require adequate sampling of rare event transitions to obtain the FES, which is computationally very demanding. Here we introduce the variational free energy surface (VaFES), a dataset-free framework that directly models FESs using tractable-density generative models. Rare events can then be immediately identified from the FES with their configurations generated directly via one-shot sampling of generative models. By extending a coarse-grained collective variable (CV) into its reversible equivalent, VaFES constructs a latent space of intermediate representation in which the CVs explicitly occupy a subset of dimensions. This latent-space construction preserves the physical interpretability and transparent controllability of the CVs by design, while accommodating arbitrary CV formulations. The reversibility makes the system energy exactly accessible, enabling variational optimization of the FES without pre-generated simulation data. A single optimization yields a continuous, differentiable FES together with one-shot generation of rare-event configurations. Our method can reproduce the exact analytical solution for the bistable dimer potential and identify a chignolin native folded state in close alignment with the experimental NMR structure. Our approach thus establishes a scalable, systematic framework for advancing the study of complex statistical systems.
STAT-MECHApr 29, 2024
Deep generative modelling of canonical ensemble with differentiable thermal propertiesShuo-Hui Li, Yao-Wen Zhang, Ding Pan
We propose a variational modelling method with differentiable temperature for canonical ensembles. Using a deep generative model, the free energy is estimated and minimized simultaneously in a continuous temperature range. At optimal, this generative model is a Boltzmann distribution with temperature dependence. The training process requires no dataset, and works with arbitrary explicit density generative models. We applied our method to study the phase transitions (PT) in the Ising and XY models, and showed that the direct-sampling simulation of our model is as accurate as the Markov Chain Monte Carlo (MCMC) simulation, but more efficient. Moreover, our method can give thermodynamic quantities as differentiable functions of temperature akin to an analytical solution. The free energy aligns closely with the exact one to the second-order derivative, so this inclusion of temperature dependence enables the otherwise biased variational model to capture the subtle thermal effects at the PTs. These findings shed light on the direct simulation of physical systems using deep generative models