CLLGPEApr 6, 2024

PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks

arXiv:2404.04671v415 citationsh-index: 55Has CodeICLR
Originality Incremental advance
AI Analysis

This provides a novel tool for researchers and developers to assess LLM relationships and predict performance without needing transparent training data, though it is incremental in applying existing phylogenetic methods to a new domain.

The authors tackled the problem of understanding relationships and predicting performance among Large Language Models (LLMs) by introducing PhyloLM, a method that adapts phylogenetic algorithms to analyze LLM output similarity. The result was a phylogenetic distance metric that successfully captured known relationships across 156 models and predicted performance in benchmarks, offering a cost-effective tool for evaluating LLM capabilities.

This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metric based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes