Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
This addresses the problem of computational inefficiency in extractive QA for resource-constrained environments, offering a scalable solution with theoretical guarantees.
The paper tackles the inefficiency of Large Language Models in extractive question answering by proposing a Learning-to-Defer framework that allocates queries to specialized experts, resulting in enhanced answer reliability and significant computational overhead reduction as demonstrated on SQuADv1, SQuADv2, and TriviaQA datasets.
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions while optimizing computational efficiency. Our approach integrates a principled allocation strategy with theoretical guarantees on optimal deferral that balances performance and cost. Empirical evaluations on SQuADv1, SQuADv2, and TriviaQA demonstrate that our method enhances answer reliability while significantly reducing computational overhead, making it well-suited for scalable and efficient EQA deployment.