Manuel Rigger

DB
h-index3
9papers
299citations
Novelty56%
AI Score49

9 Papers

DBApr 15
NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling

Zhanhao Zhao, Haotian Gao, Naili Xing et al.

Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database components to remain effective and efficient in the face of data and workload drift. Robustness, therefore, is a key factor in assessing their practical applicability. Although recent works examine learned database components under specific drift, they fail to enable systematic performance evaluations across a broad range of drift or under customized drift as needed. This paper presents NeurBench, a new benchmark suite that supports evaluating learned database components under measurable and controllable data and workload drift. We quantify diverse types of drift by introducing a key concept called the drift factor. Building on this formulation, we propose a drift-aware data and workload generation framework that effectively simulates real-world drift while preserving inherent correlations. Experimental results demonstrate the effectiveness of NeurBench in generating realistic data and workload drift, while providing insights into the performance of representative learned database components under different drift scenarios.

DBMar 17
Dialect-Agnostic SQL Parsing via LLM-Based Segmentation

Junwen An, Kabilan Mahathevan, Manuel Rigger

SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fail when encountering unrecognized dialect-specific syntax. While Large Language Models (LLMs) have shown promise in understanding SQL queries, their inherent limitations in handling hierarchical structures and hallucination risks limit their direct applicability in parsing. To address these limitations, we propose SQLFlex, a novel query rewriting framework that integrates grammar-based parsing with LLM-based segmentation to parse diverse SQL dialects robustly. Our core idea is to decompose hierarchical parsing to sequential segmentation tasks, which better aligns with the strength of LLMs and improves output reliability through validation checks. Specifically, SQLFlex uses clause-level segmentation and expression-level segmentation as two strategies that decompose elements on different levels of a query. We extensively evaluated SQLFlex on both real-world use cases and in a standalone evaluation. In SQL linting, SQLFlex outperforms SQLFluff in ANSI mode by 63.68% in F1 score while matching its dialect-specific mode performance. In test-case reduction, SQLFlex outperforms SQLess by up to 10 times in simplification rate. In the standalone evaluation, it parses 91.55% to 100% of queries across eight distinct dialects, outperforming all baseline parsers. We believe SQLFlex can serve as a foundation for many query analysis and rewriting use cases.

IVMay 8
CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

Rajdeep Singh Hundal, Yan Xiao, Jin Song Dong et al.

Many vision datasets now provide segmentation masks in addition to annotated images to support a wide range of tasks. In this work, we propose Class Activation Map Attention Learning (CAMAL), an efficient and scalable method that utilizes segmentation masks to improve attention alignment and faithfulness in vision models. Specifically, attention alignment refers to the degree to which a model's attention aligns with ground-truth discriminative regions, while attention faithfulness refers to the degree to which a model's attention influences its decision. Improving both attention alignment and faithfulness is essential for ensuring that model attention is both spatially accurate and causally meaningful. To improve attention alignment and faithfulness in vision models, CAMAL first extracts the model's attention for each image during training and then compares the attention to ground-truth discriminative regions obtained from the corresponding segmentation masks. CAMAL then acts as an auxiliary regularizer, encouraging attention that aligns with ground-truth discriminative regions, while suppressing attention elsewhere. We evaluated CAMAL across two learning paradigms -- Deep Learning (DL) and Deep Reinforcement Learning (DRL) -- and observed consistent, significant improvements in both attention alignment and faithfulness. In particular, CAMAL yields statistically significant gains in attention alignment across all settings, and improves attention faithfulness by over 35% compared to recent work. Moreover, we show that improved attention alignment and faithfulness enhance explainability, while yielding improved or comparable generalization performance without increasing inference cost. These findings demonstrate that the spatial information contained within segmentation masks can be effectively leveraged to guide model attention across learning tasks.

SEMar 28, 2025
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations

Rajdeep Singh Hundal, Yan Xiao, Xiaochun Cao et al.

Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games. Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist. However, studies make the mistake of assuming implementations of the same algorithm to be consistent and thus, interchangeable. In this paper, through a differential testing lens, we present the results of studying the extent of implementation inconsistencies, their effect on the implementations' performance, as well as their impact on the conclusions of prior studies under the assumption of interchangeable implementations. The outcomes of our differential tests showed significant discrepancies between the tested algorithm implementations, indicating that they are not interchangeable. In particular, out of the five PPO implementations tested on 56 games, three implementations achieved superhuman performance for 50% of their total trials while the other two implementations only achieved superhuman performance for less than 15% of their total trials. As part of a meticulous manual analysis of the implementations' source code, we analyzed implementation discrepancies and determined that code-level inconsistencies primarily caused these discrepancies. Lastly, we replicated a study and showed that this assumption of implementation interchangeability was sufficient to flip experiment outcomes. Therefore, this calls for a shift in how implementations are being used.

SEJul 16, 2020
Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction

Manuel Rigger, Zhendong Su

Database Management Systems (DBMS) are used ubiquitously. To efficiently access data, they apply sophisticated optimizations. Incorrect optimizations can result in logic bugs, which cause a query to compute an incorrect result set. We propose Non-Optimizing Reference Engine Construction (NoREC), a fully-automatic approach to detect optimization bugs in DBMS. Conceptually, this approach aims to evaluate a query by an optimizing and a non-optimizing version of a DBMS, to then detect differences in their returned result set, which would indicate a bug in the DBMS. Obtaining a non-optimizing version of a DBMS is challenging, because DBMS typically provide limited control over optimizations. Our core insight is that a given, potentially randomly-generated optimized query can be rewritten to one that the DBMS cannot optimize. Evaluating this unoptimized query effectively corresponds to a non-optimizing reference engine executing the original query. We evaluated NoREC in an extensive testing campaign on four widely-used DBMS, namely PostgreSQL, MariaDB, SQLite, and CockroachDB. We found 159 previously unknown bugs in the latest versions of these systems, 141 of which have been fixed by the developers. Of these, 51 were optimization bugs, while the remaining were error and crash bugs. Our results suggest that NoREC is effective, general and requires little implementation effort, which makes the technique widely applicable in practice.

DBJan 13, 2020
Testing Database Engines via Pivoted Query Synthesis

Manuel Rigger, Zhendong Su

Relational databases are used ubiquitously. They are managed by database management systems (DBMS), which allow inserting, modifying, and querying data using a domain-specific language called Structured Query Language (SQL). Popular DBMS have been extensively tested by fuzzers, which have been successful in finding crash bugs. However, approaches to finding logic bugs, such as when a DBMS computes an incorrect result set, have remained mostly untackled. Differential testing is an effective technique to test systems that support a common language by comparing the outputs of these systems. However, this technique is ineffective for DBMS, because each DBMS typically supports its own SQL dialect. To this end, we devised a novel and general approach that we have termed Pivoted Query Synthesis. The core idea of this approach is to automatically generate queries for which we ensure that they fetch a specific, randomly selected row, called the pivot row. If the DBMS fails to fetch the pivot row, the likely cause is a bug in the DBMS. We tested our approach on three widely-used and mature DBMS, namely SQLite, MySQL, and PostgreSQL. In total, we reported 123 bugs in these DBMS, 99 of which have been fixed or verified, demonstrating that the approach is highly effective and general. We expect that the wide applicability and simplicity of our approach will enable the improvement of robustness of many DBMS.

PLJul 1, 2019
Understanding GCC Builtins to Develop Better Tools

Manuel Rigger, Stefan Marr, Bram Adams et al.

C programs can use compiler builtins to provide functionality that the C language lacks. On Linux, GCC provides several thousands of builtins that are also supported by other mature compilers, such as Clang and ICC. Maintainers of other tools lack guidance on whether and which builtins should be implemented to support popular projects. To assist tool developers who want to support GCC builtins, we analyzed builtin use in 4,913 C projects from GitHub. We found that 37% of these projects relied on at least one builtin. Supporting an increasing proportion of projects requires support of an exponentially increasing number of builtins; however, implementing only 10 builtins already covers over 30% of the projects. Since we found that many builtins in our corpus remained unused, the effort needed to support 90% of the projects is moderate, requiring about 110 builtins to be implemented. For each project, we analyzed the evolution of builtin use over time and found that the majority of projects mostly added builtins. This suggests that builtins are not a legacy feature and must be supported in future tools. Systematic testing of builtin support in existing tools revealed that many lacked support for builtins either partially or completely; we also discovered incorrect implementations in various tools, including the formally verified CompCert compiler.

SEAug 2, 2018
Debugging Native Extensions of Dynamic Languages

Jacob Kreindl, Manuel Rigger, Hanspeter Mössenböck

Many dynamic programming languages such as Ruby and Python enable developers to use so called native extensions, code implemented in typically statically compiled languages like C and C++. However, debuggers for these dynamic languages usually lack support for also debugging these native extensions. GraalVM can execute programs implemented in various dynamic programming languages and, by using the LLVM-IR interpreter Sulong, also their native extensions. We added support for source-level debugging to Sulong based on GraalVM's debugging framework by associating run-time debug information from the LLVM-IR level to the original program code. As a result, developers can now use GraalVM to debug source code written in multiple LLVM-based programming languages as well as programs implemented in various dynamic languages that invoke it in a common debugger front-end.

CRJun 23, 2018
Context-aware Failure-oblivious Computing as a Means of Preventing Buffer Overflows

Manuel Rigger, Daniel Pekarek, Hanspeter Mössenböck

In languages like C, buffer overflows are widespread. A common mitigation technique is to use tools that detect them during execution and abort the program to prevent the leakage of data or the diversion of control flow. However, for server applications, it would be desirable to prevent such errors while maintaining availability of the system. To this end, we present an approach to handle buffer overflows without aborting the program. This approach involves implementing a continuation logic in library functions based on an introspection function that allows querying the size of a buffer. We demonstrate that introspection can be implemented in popular bug-finding and bug-mitigation tools such as LLVM's AddressSanitizer, SoftBound, and Intel-MPX-based bounds checking. We evaluated our approach in a case study of real-world bugs and show that for tools that explicitly track bounds data, introspection results in a low performance overhead.