LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
It addresses the need for researchers and developers to build and customize QA systems locally, though it is incremental as it builds on existing toolkits and methods.
The paper tackles the problem of customizing retrieval-augmented QA systems by proposing LocalRQA, an open-source toolkit that enables local training, testing, and deployment, and finds that 7B-models trained with it achieve performance similar to OpenAI's text-ada-002 and GPT-4-turbo.
Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.