AINov 24, 2025

HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions

Shaoyin Ma, Chenggong Hu, Huiqiong Wang, Li Sun, Mingli Song, Jie Song

arXiv:2511.18715v21 citations

Originality Highly original

AI Analysis

This addresses a critical scalability bottleneck for LLM agents in accessing open model repositories, offering a novel solution with substantial performance gains.

The paper tackles the problem of selecting appropriate AI models from large repositories (like HuggingFace with over 2M models) based on natural language requests, proposing HuggingR⁴ as a progressive reasoning framework that achieves 92.03% workability and 82.46% reasonability, outperforming state-of-the-art baselines by 26.51% and 33.25% respectively while reducing token consumption by 6.9×.

Building effective LLM agents increasingly requires selecting appropriate AI models as tools from large open repositories (e.g., HuggingFace with > 2M models) based on natural language requests. Unlike invoking a fixed set of API tools, repository-scale model selection must handle massive, evolving candidates with incomplete metadata. Existing approaches incorporate full model descriptions into prompts, resulting in prompt bloat, excessive token costs, and limited scalability. To address these issues, we propose HuggingR$^4$, the first framework to recast model selection as an iterative reasoning process rather than one-shot retrieval. By synergistically integrating Reasoning, Retrieval, Refinement, and Reflection, HuggingR$^4$ progressively decomposes user intent, retrieves candidates through multi-round deliberation, refines selections via fine-grained analysis, and validates results through reflection. To facilitate rigorous evaluation, we introduce a large-scale benchmark comprising 14,399 diverse user requests across 37 task categories. Experiments demonstrate that HuggingR$^4$ achieves 92.03% workability and 82.46% reasonability-outperforming current state-of-the-art baselines by 26.51% and 33.25%, respectively, while reducing token consumption by $6.9 \times$.

View on arXiv PDF

Similar