Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
This provides a new evaluation tool for researchers and practitioners working with LLMs, though it is incremental as it builds on existing information theory and geometry principles.
The paper tackles the problem of evaluating large language models (LLMs) by introducing Diff-eRank, a novel rank-based metric that measures how efficiently models eliminate redundant information during training, with results showing it increases with model size and correlates well with conventional metrics like loss and accuracy.
Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at https://github.com/waltonfuture/Diff-eRank.