zELO: ELO-inspired Training Method for Rerankers and Embedding Models
This addresses the need for efficient and versatile retrieval systems across multiple domains, offering an incremental improvement through a novel training approach.
The paper tackles the problem of improving retrieval performance by introducing zELO, a training method based on the Thurstone model, which led to state-of-the-art reranker models achieving the highest scores in domains like finance and legal, outperforming proprietary models on metrics such as NDCG@10 and Recall.
We introduce a novel training methodology named zELO, which optimizes retrieval performance via the analysis that ranking tasks are statically equivalent to a Thurstone model. Based on the zELO method, we use unsupervised data in order train a suite of state-of-the-art open-weight reranker models: zerank-1 and zerank-1-small. These models achieve the highest retrieval scores in multiple domains, including finance, legal, code, and STEM, outperforming closed-source proprietary rerankers on both NDCG@10 and Recall. These models also demonstrate great versatility, maintaining their 0-shot performance on out-of-domain and private customer datasets. The training data included 112,000 queries and 100 documents per query, and was trained end-to-end from unannotated queries and documents in less than 10,000 H100-hours.