Kurt Stockinger

DB
h-index24
21papers
1,387citations
Novelty46%
AI Score51

21 Papers

DBJun 7, 2023
ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis et al.

Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%. However, Spider mainly contains simple databases with few tables, columns, and entries, which does not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.

CLJul 3, 2023
Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines

Ellery Smith, Rahel Paloots, Dimitris Giagkos et al.

With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on the cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction. In this work, we present the design, implementation and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data in the domain of cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard. Our system is publicly available on the web at https://cancercelllines.org

CLSep 28, 2023
Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph Question Answering Systems

Catherine Kosten, Philippe Cudré-Mauroux, Kurt Stockinger

With the recent spike in the number and availability of Large Language Models (LLMs), it has become increasingly important to provide large and realistic benchmarks for evaluating Knowledge Graph Question Answering (KGQA) systems. So far the majority of benchmarks rely on pattern-based SPARQL query generation approaches. The subsequent natural language (NL) question generation is conducted through crowdsourcing or other automated methods, such as rule-based paraphrasing or NL question templates. Although some of these datasets are of considerable size, their pitfall lies in their pattern-based generation approaches, which do not always generalize well to the vague and linguistically diverse questions asked by humans in real-world contexts. In this paper, we introduce Spider4SPARQL - a new SPARQL benchmark dataset featuring 9,693 previously existing manually generated NL questions and 4,721 unique, novel, and complex SPARQL queries of varying complexity. In addition to the NL/SPARQL pairs, we also provide their corresponding 166 knowledge graphs and ontologies, which cover 138 different domains. Our complex benchmark enables novel ways of evaluating the strengths and weaknesses of modern KGQA systems. We evaluate the system with state-of-the-art KGQA systems as well as LLMs, which achieve only up to 45\% execution accuracy, demonstrating that Spider4SPARQL is a challenging benchmark for future research.

DBNov 8, 2024Code
SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark

Sithursan Sivasubramaniam, Cedric Osei-Akoto, Yi Zhang et al.

Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy -- a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL. We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs. Our evaluation sheds light on the trade-offs between database models and query languages for different ICL strategies and LLMs. Last, SM3-Text-to-Query is easily extendable to additional query languages or real, standard-based patient databases.

LGDec 15, 2025
Quanvolutional Neural Networks for Spectrum Peak-Finding

Lukas Bischof, Rudolf M. Füchslin, Kurt Stockinger et al.

The analysis of spectra, such as Nuclear Magnetic Resonance (NMR) spectra, for the comprehensive characterization of peaks is a challenging task for both experts and machines, especially with complex molecules. This process, also known as deconvolution, involves identifying and quantifying the peaks in the spectrum. Machine learning techniques have shown promising results in automating this process. With the advent of quantum computing, there is potential to further enhance these techniques. In this work, inspired by the success of classical Convolutional Neural Networks (CNNs), we explore the use of Quanvolutional Neural Networks (QuanvNNs) for the multi-task peak finding problem, involving both peak counting and position estimation. We implement a simple and interpretable QuanvNN architecture that can be directly compared to its classical CNN counterpart, and evaluate its performance on a synthetic NMR-inspired dataset. Our results demonstrate that QuanvNNs outperform classical CNNs on challenging spectra, achieving an 11\% improvement in F1 score and a 30\% reduction in mean absolute error for peak position estimation. Additionally, QuanvNNs appear to exhibit better convergence stability for harder problems.

DBFeb 13, 2024
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries

Jonathan Fürst, Catherine Kosten, Farhad Nooralahzadeh et al.

Text-to-SQL systems (also known as NL-to-SQL systems) have become an increasingly popular solution for bridging the gap between user capabilities and SQL-based data access. These systems translate user requests in natural language to valid SQL statements for a specific database. Recent Text-to-SQL systems have benefited from the rapid improvement of transformer-based language models. However, while Text-to-SQL systems that incorporate such models continuously reach new high scores on -- often synthetic -- benchmark datasets, a systematic exploration of their robustness towards different data models in a real-world, realistic scenario is notably missing. This paper provides the first in-depth evaluation of the data model robustness of Text-to-SQL systems in practice based on a multi-year international project focused on Text-to-SQL interfaces. Our evaluation is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022, during which about 6K natural language questions were asked and executed. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models. For each data model, we explore the performance of representative Text-to-SQL systems and language models. We further quantify the impact of training data size, pre-, and post-processing steps as well as language model inference time. Our comprehensive evaluation sheds light on the design choices of real-world Text-to-SQL systems and their impact on moving from research prototypes to real deployments. Last, we provide a new benchmark dataset to the community, which is the first to enable the evaluation of different data models for the same dataset and is substantially more challenging than most previous datasets in terms of query complexity.

AIDec 24, 2024
Explainable Multi-Modal Data Exploration in Natural Language via LLM Agent

Farhad Nooralahzadeh, Yi Zhang, Jonathan Furst et al.

International enterprises, organizations, or hospitals collect large amounts of multi-modal data stored in databases, text documents, images, and videos. While there has been recent progress in the separate fields of multi-modal data exploration as well as in database systems that automatically translate natural language questions to database query languages, the research challenge of querying database systems combined with other unstructured modalities such as images in natural language is widely unexplored. In this paper, we propose XMODE - a system that enables explainable, multi-modal data exploration in natural language. Our approach is based on the following research contributions: (1) Our system is inspired by a real-world use case that enables users to explore multi-modal information systems. (2) XMODE leverages a LLM-based agentic AI framework to decompose a natural language question into subtasks such as text-to-SQL generation and image analysis. (3) Experimental results on multi-modal datasets over relational data and images demonstrate that our system outperforms state-of-the-art multi-modal exploration systems, excelling not only in accuracy but also in various performance metrics such as query latency, API costs, planning efficiency, and explanation quality, thanks to the more effective utilization of the reasoning capabilities of LLMs.

DBDec 19, 2025
Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks

Jasmin Saxer, Isabella Maria Aigner, Luise Linzmeier et al.

Text-to-SQL systems allow non-SQL experts to interact with relational databases using natural language. However, their tendency to generate executable SQL for ambiguous, out-of-scope, or unanswerable queries introduces a hidden risk, as outputs may be misinterpreted as correct. This risk is especially serious in biomedical contexts, where precision is critical. We therefore present Query Carefully, a pipeline that integrates LLM-based SQL generation with explicit detection and handling of unanswerable inputs. Building on the OncoMX component of ScienceBenchmark, we construct OncoMX-NAQ (No-Answer Questions), a set of 80 no-answer questions spanning 8 categories (non-SQL, out-of-schema/domain, and multiple ambiguity types). Our approach employs llama3.3:70b with schema-aware prompts, explicit No-Answer Rules (NAR), and few-shot examples drawn from both answerable and unanswerable questions. We evaluate SQL exact match, result accuracy, and unanswerable-detection accuracy. On the OncoMX dev split, few-shot prompting with answerable examples increases result accuracy, and adding unanswerable examples does not degrade performance. On OncoMX-NAQ, balanced prompting achieves the highest unanswerable-detection accuracy (0.8), with near-perfect results for structurally defined categories (non-SQL, missing columns, out-of-domain) but persistent challenges for missing-value queries (0.5) and column ambiguity (0.3). A lightweight user interface surfaces interim SQL, execution results, and abstentions, supporting transparent and reliable text-to-SQL in biomedical applications.

69.7CVApr 10
Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts

Farhad Nooralahzadeh, Omid Rohanian, Yi Zhang et al.

When a Vision-Language Model (VLM) sees a blue banana and answers "yellow", is the problem of perception or arbitration? We explore the question in ten VLMs with various sizes and reveal an Encoding--Grounding Dissociation: models that fail to report what they see (and thus provide a wrong answer) still encode the visual evidence as strongly as models that provide the correct answer. Using Multimodal Arbitration Crossover (MAC) analysis with layer-by-layer Logit Lens probing, we track the competition between visual and prior signals across every layer of each model. We show that visual attributes can be linearly decodable from early layers (AUC > 0.86). The accuracy remains nearly identical for both successful and failed samples. However, the gap in the final-layer logit -- not the strength of encoding -- better predicts grounding outcomes with a correlation of . After having studied when VLMs base their answers on image clues rather than prior knowledge, we want to understand the causal relationships. We establish causality through full-sequence activation patching. The standard last-token interventions in LLM interpretability do not affect VLMs. In contrast, replacing the full token sequence at layers identified by MAC alters 60 to 84% of outputs. Partial-token decomposition shows that image tokens carry almost all of the causal impact, while text tokens have none. Scaling addresses the remaining architectural differences to achieve perfect retention. Moving from diagnosis to intervention, we show that training-free activation steering -- both linear and sparse autoencoder-guided -- in early layers can improve visual grounding by up to +3.8% with degrading performance in some setups. Overall, these findings lead to a clear conclusion: VLMs already see well, but the challenge is acting on what they see. Targeted interventions can help to bridge this gap.

IROct 9, 2025
VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents

Daniel Huwiler, Kurt Stockinger, Jonathan Fürst

Retrieval-Augmented Generation (RAG) systems fail when documents evolve through versioning-a ubiquitous characteristic of technical documentation. Existing approaches achieve only 58-64% accuracy on version-sensitive questions, retrieving semantically similar content without temporal validity checks. We present VersionRAG, a version-aware RAG framework that explicitly models document evolution through a hierarchical graph structure capturing version sequences, content boundaries, and changes between document states. During retrieval, VersionRAG routes queries through specialized paths based on intent classification, enabling precise version-aware filtering and change tracking. On our VersionQA benchmark-100 manually curated questions across 34 versioned technical documents-VersionRAG achieves 90% accuracy, outperforming naive RAG (58%) and GraphRAG (64%). VersionRAG reaches 60% accuracy on implicit change detection where baselines fail (0-10%), demonstrating its ability to track undocumented modifications. Additionally, VersionRAG requires 97% fewer tokens during indexing than GraphRAG, making it practical for large-scale deployment. Our work establishes versioned document QA as a distinct task and provides both a solution and benchmark for future research.

DBJun 4, 2025
TransClean: Finding False Positives in Multi-Source Entity Matching under Real-World Conditions via Transitive Consistency

Fernando de Meer Pardo, Branka Hadji Misheva, Martin Braschler et al.

We present TransClean, a method for detecting false positive predictions of entity matching algorithms under real-world conditions characterized by large-scale, noisy, and unlabeled multi-source datasets that undergo distributional shifts. TransClean is explicitly designed to operate with multiple data sources in an efficient, robust and fast manner while accounting for edge cases and requiring limited manual labeling. TransClean leverages the Transitive Consistency of a matching, a measure of the consistency of a pairwise matching model f_theta on the matching it produces G_f_theta, based both on its predictions on directly evaluated record pairs and its predictions on implied record pairs. TransClean iteratively modifies a matching through gradually removing false positive matches while removing as few true positive matches as possible. In each of these steps, the estimation of the Transitive Consistency is exclusively done through model evaluations and produces quantities that can be used as proxies of the amounts of true and false positives in the matching while not requiring any manual labeling, producing an estimate of the quality of the matching and indicating which record groups are likely to contain false positives. In our experiments, we compare combining TransClean with a naively trained pairwise matching model (DistilBERT) and with a state-of-the-art end-to-end matching method (CLER) and illustrate the flexibility of TransClean in being able to detect most of the false positives of either setup across a variety of datasets. Our experiments show that TransClean induces an average +24.42 F1 score improvement for entity matching in a multi-source setting when compared to traditional pair-wise matching algorithms.

DBNov 7, 2024
GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints

Pavel Sulimov, Claude Lehmann, Kurt Stockinger

Query optimization has become a research area where classical algorithms are being challenged by machine learning algorithms. At the same time, recent trends in learned query optimizers have shown that it is prudent to take advantage of decades of database research and augment classical query optimizers by shrinking the plan search space through different types of hints (e.g. by specifying the join type, scan type or the order of joins) rather than completely replacing the classical query optimizer with machine learning models. It is especially relevant for cases when classical optimizers cannot fully enumerate all logical and physical plans and, as an alternative, need to rely on less robust approaches like genetic algorithms. However, even symbiotically learned query optimizers are hampered by the need for vast amounts of training data, slow plan generation during inference and unstable results across various workload conditions. In this paper, we present GenJoin - a novel learned query optimizer that considers the query optimization problem as a generative task and is capable of learning from a random set of subplan hints to produce query plans that outperform the classical optimizer. GenJoin is the first learned query optimizer that significantly and consistently outperforms PostgreSQL as well as state-of-the-art methods on two well-known real-world benchmarks across a variety of workloads using rigorous machine learning evaluations.

DBJun 21, 2024
GraLMatch: Matching Groups of Entities with Graphs and Language Models

Fernando De Meer Pardo, Claude Lehmann, Dennis Gehrig et al.

In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group, records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represent the records and whether they are a match or not. We present a real-world instance of this problem, where the challenge is to match records of companies and financial securities originating from different data providers. We also introduce two new multi-source benchmark datasets that present similar matching challenges as real-world records. A distinctive characteristic of these records is that they are regularly updated following real-world events, but updates are not applied uniformly across data sources. This phenomenon makes the matching of certain groups of records only possible through the use of transitive information. In our experiments, we illustrate how considering transitively matched records is challenging since a limited amount of false positive pairwise match predictions can throw off the group assignment of large quantities of records. Thus, we propose GraLMatch, a method that can partially detect and remove false positive pairwise predictions through graph-based properties. Finally, we showcase how fine-tuning a Transformer-based model (DistilBERT) on a reduced number of labeled samples yields a better final entity group matching than training on more samples and/or incorporating fine-tuning optimizations, illustrating how precision becomes the deciding factor in the entity group matching of large volumes of records.

CLJun 5, 2024
StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Farhad Nooralahzadeh, Yi Zhang, Ellery Smith et al.

The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset.

DBJul 22, 2021
Multiple Query Optimization using a Hybrid Approach of Classical and Quantum Computing

Tobias Fankhauser, Marc E. Solèr, Rudolf M. Füchslin et al.

Quantum computing promises to solve difficult optimization problems in chemistry, physics and mathematics more efficiently than classical computers, but requires fault-tolerant quantum computers with millions of qubits. To overcome errors introduced by today's quantum computers, hybrid algorithms combining classical and quantum computers are used. In this paper we tackle the multiple query optimization problem (MQO) which is an important NP-hard problem in the area of data-intensive problems. We propose a novel hybrid classical-quantum algorithm to solve the MQO on a gate-based quantum computer. We perform a detailed experimental evaluation of our algorithm and compare its performance against a competing approach that employs a quantum annealer -- another type of quantum computer. Our experimental results demonstrate that our algorithm currently can only handle small problem sizes due to the limited number of qubits available on a gate-based quantum computer compared to a quantum computer based on quantum annealing. However, our algorithm shows a qubit efficiency of close to 99% which is almost a factor of 2 higher compared to the state of the art implementation. Finally, we analyze how our algorithm scales with larger problem sizes and conclude that our approach shows promising results for near-term quantum computers.

LGApr 9, 2021
INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian et al.

A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.

DBMay 29, 2020
ValueNet: A Natural Language-to-SQL System that Learns from Database Information

Ursin Brunner, Kurt Stockinger

Building natural language (NL) interfaces for databases has been a long-standing challenge for several decades. The major advantage of these so-called NL-to-SQL systems is that end-users can query complex databases without the need to know SQL or the underlying database schema. Due to significant advancements in machine learning, the recent focus of research has been on neural networks to tackle this challenge on complex datasets like Spider. Several recent NL-to-SQL systems achieve promising results on this dataset. However, none of the published systems, that provide either the source code or executable binaries, extract and incorporate values from the user questions for generating SQL statements. Thus, the practical use of these systems in a real-world scenario has not been sufficiently demonstrated yet. In this paper we propose ValueNet light and ValueNet -- two end-to-end NL-to-SQL systems that incorporate values using the challenging Spider dataset. The main idea of our approach is to use not only metadata information from the underlying database but also information on the base data as input for our neural network architecture. In particular, we propose a novel architecture sketch to extract values from a user question and come up with possible value candidates which are not explicitly mentioned in the question. We then use a neural model based on an encoder-decoder architecture to synthesize the SQL query. Finally, we evaluate our model on the Spider challenge using the Execution Accuracy metric, a more difficult metric than used by most participants of the challenge. Our experimental evaluation demonstrates that ValueNet light and ValueNet reach state-of-the-art results of 67% and 62% accuracy, respectively, for translating from NL to SQL whilst incorporating values.

AIApr 16, 2020
A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer et al.

In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of query tokens to OT operations. In our method, we randomly generate OTs from a context-free grammar. Afterwards, annotators have to write the appropriate natural language question that is represented by the OT. Finally, the annotators assign the tokens to the OT operations. We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases. We compare OTTA to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our corpus is a challenging dataset and that the token alignment can be leveraged to increase the performance significantly.

DBNov 26, 2019
Join Query Optimization with Deep Reinforcement Learning Algorithms

Jonas Heitz, Kurt Stockinger

Join query optimization is a complex task and is central to the performance of query processing. In fact it belongs to the class of NP-hard problems. Traditional query optimizers use dynamic programming (DP) methods combined with a set of rules and restrictions to avoid exhaustive enumeration of all possible join orders. However, DP methods are very resource intensive. Moreover, given simplifying assumptions of attribute independence, traditional query optimizers rely on erroneous cost estimations, which can lead to suboptimal query plans. Recent success of deep reinforcement learning (DRL) creates new opportunities for the field of query optimization to tackle the above-mentioned problems. In this paper, we present our DRL-based Fully Observed Optimizer (FOOP) which is a generic query optimization framework that enables plugging in different machine learning algorithms. The main idea of FOOP is to use a data-adaptive learning query optimizer that avoids exhaustive enumerations of join orders and is thus significantly faster than traditional approaches based on dynamic programming. In particular, we evaluate various DRL-algorithms and show that Proximal Policy Optimization significantly outperforms Q-learning based algorithms. Finally we demonstrate how ensemble learning techniques combined with DRL can further improve the query optimizer.

DBJun 21, 2019
A Comparative Survey of Recent Natural Language Interfaces for Databases

Katrin Affolter, Kurt Stockinger, Abraham Bernstein

Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power. In this paper, we give an overview over 24 recently developed NLIs for databases. Each of the systems is evaluated using a curated list of ten sample questions to show their strengths and weaknesses. We categorize the NLIs into four groups based on the methodology they are using: keyword-, pattern-, parsing-, and grammar-based NLI. Overall, we learned that keyword-based systems are enough to answer simple questions. To solve more complex questions involving subqueries, the system needs to apply some sort of parsing to identify structural dependencies. Grammar-based systems are overall the most powerful ones, but are highly dependent on their manually designed rules. In addition to providing a systematic analysis of the major systems, we derive lessons learned that are vital for designing NLIs that can answer a wide range of user questions.

DBJun 5, 2019
VoIDext: Vocabulary and Patterns for Enhancing Interoperable Datasets with Virtual Links

Tarcisio Mendes de Farias, Kurt Stockinger, Christophe Dessimoz

Semantic heterogeneity remains a problem when interoperating with data from sources of different scopes and knowledge domains. Causes for this challenge are context-specific requirements (i.e. no "one model fits all"), different data modelling decisions, domain-specific purposes, and technical constraints. Moreover, even if the problem of semantic heterogeneity among different RDF publishers and knowledge domains is solved, querying and accessing the data of distributed RDF datasets on the Web is not straightforward. This is because of the complex and fastidious process needed to understand how these datasets can be related or linked, and consequently, queried. To address this issue, we propose to extend the existing Vocabulary of Interlinked Datasets (VoID) by introducing new terms such as the Virtual Link Set concept and data model patterns. A virtual link is a connection between resources such as literals and IRIs (Internationalized Resource Identifier) with some commonality where each of these resources is from a different RDF dataset. The links are required in order to understand how to semantically relate datasets. In addition, we describe several benefits of using virtual links to improve interoperability between heterogenous and independent datasets. Finally, we exemplify and apply our approach to multiple world-wide used RDF datasets.