WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
This work addresses the need for cost-effective and accurate web-enhanced QA systems, though it appears incremental as it builds upon and improves existing methods like WebGPT.
The authors tackled the problem of building an efficient web-enhanced question-answering system by augmenting a pre-trained large language model with web search and retrieval, resulting in WebGLM, which outperforms similar-sized WebGPT and performs comparably to a much larger version in human evaluation.
We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at \url{https://github.com/THUDM/WebGLM}.