SemBench: A Benchmark for Semantic Query Processing Engines
This provides a benchmark for researchers and developers working on semantic query processing engines, but it is incremental as it builds on existing systems and concepts.
The authors tackled the lack of a benchmark for semantic query processing engines, which use LLMs to extend SQL with natural language operators for multimodal data, and they evaluated it on four systems to reveal current strengths and weaknesses.
We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with semantic operators, configured by natural language instructions, that are evaluated via LLMs and enable users to perform various operations on multimodal data. Our benchmark introduces diversity across three key dimensions: scenarios, modalities, and operators. Included are scenarios ranging from movie review analysis to medical question-answering. Within these scenarios, we cover different data modalities, including images, audio, and text. Finally, the queries involve a diverse set of operators, including semantic filters, joins, mappings, ranking, and classification operators. We evaluated our benchmark on three academic systems (LOTUS, Palimpzest, and ThalamusDB) and one industrial system, Google BigQuery. Although these results reflect a snapshot of systems under continuous development, our study offers crucial insights into their current strengths and weaknesses, illuminating promising directions for future research.