CLFeb 21, 2024

From Text to CQL: Bridging Natural Language and Corpus Search Engine

arXiv:2402.13740v13 citationsh-index: 21
Originality Incremental advance
AI Analysis

It addresses a notable challenge for researchers and practitioners by reducing the manual effort and expertise required for CQL query construction in text corpora analysis.

This paper tackles the problem of automating the translation of natural language into Corpus Query Language (CQL) for linguistic research, presenting the first text-to-CQL task with a curated dataset and LLM-based methods that demonstrate efficacy in generating accurate queries.

Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction of CQL queries is a complex and time-intensive task that requires a great deal of expertise, which presents a notable challenge for both researchers and practitioners. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL. We present a comprehensive framework for this task, including a specifically curated large-scale dataset and methodologies leveraging large language models (LLMs) for effective text-to-CQL task. In addition, we established advanced evaluation metrics to assess the syntactic and semantic accuracy of the generated queries. We created innovative LLM-based conversion approaches and detailed experiments. The results demonstrate the efficacy of our methods and provide insights into the complexities of text-to-CQL task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes