Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation
This addresses the challenge of improving output selection for LLMs in real-world deployments where models are black-box and lack human references, though it is incremental as it builds on existing sampling methods.
The paper tackled the problem of selecting high-quality outputs from multiple stochastic samples generated by pre-trained LLMs, using question generation as a case study, and demonstrated that their prompt-based approach effectively selects higher-quality questions than greedy generation, with empirical validation through automatic and human evaluations.
Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-based approaches to selecting high-quality questions from a set of LLM-generated candidates. Our method works under the constraints of 1) a black-box (non-modifiable) question generation model and 2) lack of access to human-annotated references -- both of which are realistic limitations for real-world deployment of LLMs. With automatic as well as human evaluations, we empirically demonstrate that our approach can effectively select questions of higher qualities than greedy generation.