CLNov 6, 2023

Safurai-Csharp: Harnessing Synthetic Data to improve language-specific Code LLM

Davide Cifarelli, Leonardo Boiardi, Alessandro Puppo, Leon Jovanovic

arXiv:2311.03243v12 citationsh-index: 5Has Code

Originality Synthesis-oriented

AI Analysis

This provides a specialized tool for C# developers to streamline workflows and aid code learning, though it appears incremental as it builds on existing models and techniques.

The paper tackles the problem of generating, completing, and debugging C# code by introducing Safurai-Csharp, an open-source model based on CodeLlama 34B and fine-tuned with EvolInstruct, achieving 56.33% on the Manual MultiPL-E benchmark.

This paper introduces Safurai-Csharp, an open-source model designed to specialize in the generation, completion, and debugging of C# code. Safurai-Csharp is built upon the novel CodeLlama 34B model and leverages the EvolInstruct technique, creating a refined and expanded dataset for its fine-tuning process. The results of its performance, a notable score of 56.33% on the Manual MultiPL-E benchmark (Zero-Shot, Pass@1), signal its high capacity to streamline developers' workflows and aid code learning. It shows promise in setting new stakes in the landscape of open-source C# LLMs and hopes to inspire more inclusive and wide-ranging development in the field of language-specific LLMs.

View on arXiv PDF

Similar