CLSep 20, 2023

Safurai 001: New Qualitative Approach for Code LLM Evaluation

Davide Cifarelli, Leonardo Boiardi, Alessandro Puppo

arXiv:2309.11385v1h-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses the need for better evaluation metrics for coding LLMs, but it is incremental as it builds on existing models and benchmarks.

The paper tackles the problem of evaluating code LLMs by introducing Safurai-001, a model that competes with existing ones like WizardCoder and GPT-3.5, showing improvements such as outperforming GPT-3.5 by 1.58% and WizardCoder by 18.78% in Code Readability.

This paper presents Safurai-001, a new Large Language Model (LLM) with significant potential in the domain of coding assistance. Driven by recent advancements in coding LLMs, Safurai-001 competes in performance with the latest models like WizardCoder [Xu et al., 2023], PanguCoder [Shen et al., 2023] and Phi-1 [Gunasekar et al., 2023] but aims to deliver a more conversational interaction. By capitalizing on the progress in data engineering (including latest techniques of data transformation and prompt engineering) and instruction tuning, this new model promises to stand toe-to-toe with recent closed and open source developments. Recognizing the need for an efficacious evaluation metric for coding LLMs, this paper also introduces GPT4-based MultiParameters, an evaluation benchmark that harnesses varied parameters to present a comprehensive insight into the models functioning and performance. Our assessment shows that Safurai-001 can outperform GPT-3.5 by 1.58% and WizardCoder by 18.78% in the Code Readability parameter and more.

View on arXiv PDF

Similar