cs.DBComputer Science

Databases

Database systems, query processing, data management

100.0DBApr 3Code

DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

Boyan Li, Chong Chen, Zhujun Xue et al.

This addresses the need for more reliable and verifiable Text-to-SQL systems for database users, representing a paradigm shift rather than an incremental improvement.

99.7DBMay 31Code

APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL

Bowen Cao, Weibin Liao, Yushi Sun et al.

For enterprise Text-to-SQL applications, this work addresses semantic ambiguity and scalability issues in large databases by shifting from static schema to dynamic data exploration.

99.5DBMar 20Code

ReViSQL: Achieving Human-Level Text-to-SQL

Yuxuan Zhu, Tengjun Jin, Yoojin Choi et al.

This work addresses the critical challenge of improving Text-to-SQL accuracy for database research and data analytics applications, representing a significant advancement rather than an incremental improvement.

99.2DBMar 25Code

KRONE: Hierarchical and Modular Log Anomaly Detection

Lei Ma, Jinyang Liu, Tieying Zhang et al.

This addresses the challenge of detecting system failures and security risks in logs for industries like cloud computing, offering a novel approach that improves accuracy and efficiency over prior methods.

98.7SIApr 20

Topology-Aware LLM-Driven Social Simulation: A Unified Framework for Efficient and Realistic Agent Dynamics

Yuwei Xu, Shulun Zhang, Yingli Zhou et al.

For researchers in social simulation, TopoSim addresses the inefficiency and unrealistic dynamics of existing LLM-based frameworks by leveraging network topology.

98.6IRMar 19

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Duyi Pan, Tianao Lou, Xin Li et al.

This addresses recall and precision limitations in graph-based RAG for black-box knowledge graphs, offering a plug-and-play solution for knowledge-intensive tasks.

98.6LGMay 28Code

ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs

Yanlin Qi, Xinhang Chen, Huiqiang Jiang et al. · harvard, microsoft-research

This work provides a significant improvement in the efficiency and scalability of long-context LLM inference for developers and researchers working with large language models.

99.0DBApr 28Code

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

Yunxiang Su, Tianjing Zeng, Zhongjun Ding et al.

For researchers and practitioners integrating LLMs into relational data processing, this work provides a standardized taxonomy and benchmark to systematically evaluate and compare LROs, addressing the lack of unified definitions and evaluation.

98.7DBMar 20

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Zecheng Zhang, Han Zheng, Yue Xu

This work addresses the need for fine-grained, interpretable quality assessment and efficient routing in production LLM gateways, offering a practical solution for organizations managing multiple LLM providers.

98.5DBMar 12

Sema: A High-performance System for LLM-based Semantic Query Processing

Kangkang Qi, Dongyang Xie, Wenbo Li et al.

This work addresses performance and usability issues for data analysts and engineers working with LLM-based semantic queries, representing a novel method rather than an incremental improvement.

98.2DBMar 29

Enzyme: Incremental View Maintenance for Data Engineering

Ritwik Yadav, Supun Abeysinghe, Min Yang et al.

For data engineers managing large-scale ETL pipelines, Enzyme reduces operational overhead by automating incremental view maintenance, addressing a long-standing bottleneck in industrial database systems.

98.0DBApr 17

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

Boyan Li, Ou Ocean Kun Hei, Yue Yu et al.

For text-to-SQL systems, DPC provides a training-free method to improve selection accuracy without execution oracles, outperforming existing approaches.

97.7DBApr 13Code

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

Shizheng Hou, Wenqi Pei, Nuo Chen et al.

For NL2SQL researchers and practitioners, this framework provides a standardized, modular evaluation to identify bottlenecks and guide future improvements.

98.4DSMar 11

Frequency Moments in Noisy Streaming and Distributed Data under Mismatch Ambiguity

Kaiwen Liu, Qin Zhang

This addresses challenges in statistical estimation for noisy data in streaming and distributed systems, with incremental improvements over noiseless settings.

96.8ROMar 18

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Zihao Zheng, Zhihao Mao, Sicheng Tian et al.

This work addresses inference efficiency for robot control systems using VLA models, offering a hybrid approach that is incremental but provides concrete speed improvements.

97.5DBMay 29Code

NGDBench: Towards Neural Graph Data Management

Yufei Li, Yisen Gao, Jiaxuan Xiong et al.

This benchmark addresses the critical need for more robust and intelligent data management systems for organizations dealing with heterogeneous, evolving, and imperfect real-world data.

97.2DBMar 24

Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents

Xinyi Zhang, Tiantian Chen, Zhentao Han et al.

This addresses the challenge of optimizing DBMS performance for users and administrators by automating configuration tuning, representing a novel method rather than an incremental improvement.

97.0DBApr 26

SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

Yin Lin, Tianjing Zeng, Zhongjun Ding et al.

For users needing to query databases beyond standard SQL, SEMA-SQL bridges the gap between text-to-SQL and semantic operator systems by automating query generation, optimization, and execution.

97.1DSMar 25Code

AutoCSF: Provably Space-Efficient Indexing of Skewed Key-Value Workloads via Filter-Augmented Compressed Static Functions

David Torres Ramos, Vihan Lakshman, Chen Luo et al.

This addresses a critical challenge in data-intensive domains like computational genomics, where skewed distributions dominate, offering a principled solution with theoretical guarantees.

95.8CLApr 8Code

SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

Yixi Zhou, Fan Zhang, Zhiqiao Guo et al.

This addresses the overlooked dimension of structural evaluation for LLM-based program generation systems, which is incremental but important for reliability.