AI CL LGDec 13, 2023

PromptBench: A Unified Library for Evaluation of Large Language Models

Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie

arXiv:2312.07910v326.270 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This library facilitates research by offering an open, flexible codebase for creating benchmarks, deploying applications, and designing evaluation protocols for LLMs, but it is incremental as it builds on existing evaluation frameworks.

The authors introduced PromptBench, a unified library for evaluating large language models (LLMs) to assess performance and mitigate security risks, providing tools for prompt construction, engineering, dataset loading, adversarial attacks, dynamic protocols, and analysis.

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

View on arXiv PDF Code

Similar