Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
This addresses the need for responsible AI development by providing a tool for researchers and practitioners to evaluate LLMs more holistically, though it is incremental as it builds on existing evaluation methods.
The paper tackles the problem of evaluating large language models (LLMs) by introducing Libra-Leaderboard, a framework that ranks 26 mainstream LLMs from 14 organizations based on a balanced assessment of performance and safety, revealing critical safety challenges in state-of-the-art models.
To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.