DBMay 27
Are Diffusion Language Models Good Database Analysts?Peixian Ma, Xialie Zhuang, Jiantao Tan et al.
Recent advancements in large language models (LLMs) have significantly improved Natural Language to SQL (NL2SQL) tasks, yet most NL2SQL systems continue to rely on the autoregressive (AR) paradigm. The highly structured nature of SQL makes AR models susceptible to sequential error propagation due to their rigid left-to-right decoding process. Diffusion Language Models~(DLMs) have recently emerged as a promising alternative, replacing unidirectional decoding with iterative denoising to enable global sequence refinement. Nevertheless, the adoption of DLMs in NL2SQL is constrained by a fragmented ecosystem and the absence of a standardized evaluation framework, which obscures their true capabilities and impedes fair comparison with AR baselines. In this paper, we propose a unified evaluation framework that standardizes both generation and execution environments across various DLM architectures. To further improve the performance of DLMs-based NL2SQL systems, we propose \texttt{SQL-D1}, a novel agentic framework that integrates database-aware context engineering, test-time scaling and interactive optimization. Through extensive empirical studies on scaling properties, post-training stability, and primary failure modes, we demonstrate that DLMs offer distinct advantages in structural robustness and facilitate flexible trade-offs between efficiency and accuracy. By distilling these insights into structured takeaways, our work provides a systematic understanding of DLMs-based NL2SQL and lays the foundation for future database analysis agents.
CYJan 29, 2023
Learning Analytics from Spoken Discussion Dialogs in Flipped ClassroomHang Su, Borislav Dzodzo, Changlun Li et al.
The flipped classroom is a new pedagogical strategy that has been gaining increasing importance recently. Spoken discussion dialog commonly occurs in flipped classroom, which embeds rich information indicating processes and progression of students' learning. This study focuses on learning analytics from spoken discussion dialog in the flipped classroom, which aims to collect and analyze the discussion dialogs in flipped classroom in order to get to know group learning processes and outcomes. We have recently transformed a course using the flipped classroom strategy, where students watched video-recorded lectures at home prior to group-based problem-solving discussions in class. The in-class group discussions were recorded throughout the semester and then transcribed manually. After features are extracted from the dialogs by multiple tools and customized processing techniques, we performed statistical analyses to explore the indicators that are related to the group learning outcomes from face-to-face discussion dialogs in the flipped classroom. Then, machine learning algorithms are applied to the indicators in order to predict the group learning outcome as High, Mid or Low. The best prediction accuracy reaches 78.9%, which demonstrates the feasibility of achieving automatic learning outcome prediction from group discussion dialog in flipped classroom.
AIApr 3
TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question AnsweringTung Sum Thomas Kwok, Xinyu Wang, Xiaofeng Lin et al.
Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout reliability. For estimation, TABQAWORLD optimizes stepwise reasoning trajectory through table metadata including dimension, data types and key values, safely planning trajectory and compressing low-complexity actions to reduce conversation turns and latency. Designed as a training-free framework, empirical evaluations show that TABQAWORLD achieves state-of-the-art performance with 4.87% accuracy improvements over baselines, with 5.42% accuracy gain and 33.35% inference latency reduction over static settings, establishing a new standard for reliable and efficient table reasoning.
MAMar 24, 2025Code
Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena PerspectiveChanglun Li, Yao Shi, Yuyu Luo et al.
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated. Current benchmarks primarily assess LLMs' understanding on financial documents rather than the ability to manage assets or dig out trading opportunities in dynamic market conditions. Despite the release of new benchmarks for evaluating diversified tasks on the financial domain, we identified four major problems in these benchmarks, which are data leakage, navel-gazing, over-intervention, and maintenance-hard. To pave the research gap, we introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment. Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes. Moreover, we provide a web interface that visualizes LLMs' performance with fund investment metrics across different market conditions, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more realistic and fair assessment on LLM's capabilities in fund investment, offering diversified insights and revealing their potential applications in real-world financial markets. Our code is publicly available at https://github.com/HKUSTDial/DeepFund.
CLDec 26, 2024
SketchFill: Sketch-Guided Code Generation for Imputing Derived Missing ValuesYunfan Zhang, Changlun Li, Yuyu Luo et al.
Missing value is a critical issue in data science, significantly impacting the reliability of analyses and predictions. Missing value imputation (MVI) is a longstanding problem because it highly relies on domain knowledge. Large language models (LLMs) have emerged as a promising tool for data cleaning, including MVI for tabular data, offering advanced capabilities for understanding and generating content. However, despite their promise, existing LLM techniques such as in-context learning and Chain-of-Thought (CoT) often fall short in guiding LLMs to perform complex reasoning for MVI, particularly when imputing derived missing values, which require mathematical formulas and data relationships across rows and columns. This gap underscores the need for further advancements in LLM methodologies to enhance their reasoning capabilities for more reliable imputation outcomes. To fill this gap, we propose SketchFill, a novel sketch-based method to guide LLMs in generating accurate formulas to impute missing numerical values. Our experimental results demonstrate that SketchFill significantly outperforms state-of-the-art methods, achieving 56.2% higher accuracy than CoT-based methods and 78.8% higher accuracy than MetaGPT. This sets a new standard for automated data cleaning and advances the field of MVI for numerical values.