CL AI DB ET IRSep 29, 2025

Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents

Khanh Trinh Pham, Thu Huong Nguyen, Jun Jo, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

arXiv:2509.24405v113.99 citationsh-index: 16Has CodeADC

Originality Incremental advance

AI Analysis

This addresses the multilingual gap in text-to-SQL for database access, but the results are incremental as they show limited improvement over existing methods.

The paper tackles the problem of multilingual text-to-SQL by introducing MultiSpider 2.0, a benchmark extending Spider 2.0 to eight languages, and finds that state-of-the-art LLMs achieve only 4% execution accuracy with intrinsic reasoning, while a collaboration-driven language agents baseline improves it to 15%.

Text-to-SQL enables natural access to databases, yet most benchmarks are English-only, limiting multilingual progress. We introduce MultiSpider 2.0, extending Spider 2.0 to eight languages (English, German, French, Spanish, Portuguese, Japanese, Chinese, Vietnamese). It preserves Spider 2.0's structural difficulty while adding linguistic and dialectal variability, demanding deeper reasoning for complex SQL. On this benchmark, state-of-the-art LLMs (such as DeepSeek-R1 and OpenAI o1) reach only 4\% execution accuracy when relying on intrinsic reasoning, versus 60\% on MultiSpider 1.0. Therefore, we provide a collaboration-driven language agents baseline that iteratively refines queries, improving accuracy to 15\%. These results reveal a substantial multilingual gap and motivate methods that are robust across languages and ready for real-world enterprise deployment. Our benchmark is available at https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL.

View on arXiv PDF Code

Similar