CLAIOct 7, 2021

mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer

arXiv:2110.03546v223 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of text-to-SQL resources for Portuguese, which is incremental as it adapts existing tools to a new language.

The authors tackled the problem of translating Portuguese natural language questions to SQL queries by adapting the RAT-SQL+GAP system with a multilingual BART model and a translated Spider dataset, achieving 83% of the baseline performance on the Portuguese test dataset.

The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes