LGITJan 22, 2025

Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs

arXiv:2501.13018v21 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for reliable and cost-effective hyperparameter selection in machine learning, particularly for LLMs, but it is incremental as it builds on prior testing methods with structured knowledge.

The paper tackles the problem of selecting hyperparameters like prompt templates in LLMs to balance reliability and cost, introducing RG-PT, a framework that maintains formal false discovery rate guarantees while incorporating structured knowledge via a reliability graph, and it significantly outperforms existing methods like LTT and PT in experimental evaluations.

The selection of hyperparameters, such as prompt templates in large language models (LLMs), must often strike a balance between reliability and cost. In many cases, structural relationships between the expected reliability levels of the hyperparameters can be inferred from prior information and held-out data -- e.g., longer prompt templates may be more detailed and thus more reliable. However, existing hyperparameter selection methods either do not provide formal reliability guarantees or are unable to incorporate structured knowledge in the hyperparameter space. This paper introduces reliability graph-based Pareto testing (RG-PT), a novel multi-objective hyperparameter selection framework that maintains formal reliability guarantees in terms of false discovery rate (FDR), while accounting for known relationships among hyperparameters via a directed acyclic graph. Edges in the graph reflect expected reliability and cost trade-offs among hyperparameters, which are inferred via the Bradley-Terry (BT) ranking model from prior information and held-out data. Experimental evaluations demonstrate that RG-PT significantly outperforms existing methods such as learn-then-test (LTT) and Pareto testing (PT) through a more efficient exploration of the hyperparameter space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes