Yonghong Yu

IR
h-index1
4papers
29citations
Novelty44%
AI Score37

4 Papers

AIJan 26
Beyond Text-to-SQL: Can LLMs Really Debug Enterprise ETL SQL?

Jing Ye, Yiwen Duan, Yonghong Yu et al.

SQL is central to enterprise data engineering, yet generating fully correct SQL code in a single attempt remains difficult, even for experienced developers and advanced text-to-SQL LLMs, often requiring multiple debugging iterations. We introduce OurBench, the first benchmark for enterprise-level SQL reasoning and debugging. Our benchmark is built on two key innovations: (1) an automated construction workflow that uses reverse engineering to systematically inject realistic bugs into large-scale SQL code, enabling scalable and diverse benchmark generation; and (2) an execution-free evaluation framework tailored to enterprise settings, providing fast, accurate, and resource-efficient assessment. OurBench comprises 469 OurBenchSyn queries featuring syntax errors with explicit error messages, and 516 OurBenchSem queries targeting semantic errors in which the code fails to meet user intent. The queries are highly complex, averaging over 140 lines and featuring deep and wide abstract syntax trees. Evaluation of nearly 30 LLMs reveals a substantial performance gap: the best-performing model, Claude-4-Sonnet, achieves only 36.46 percent accuracy on OurBenchSyn and 32.17 percent on OurBenchSem, while most models score below 20 percent. We further explore four solution strategies, identify key challenges, and outline promising directions for enterprise SQL debugging with LLMs.

CLNov 11, 2024
PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing

Yiwen Duan, Yonghong Yu, Xiaoming Zhao et al.

Code Large Language Models (Code LLMs), such as Code llama and DeepSeek-Coder, have demonstrated exceptional performance in the code generation tasks. However, most existing models focus on the abilities of generating correct code, but often struggle with bug repair. We introduce a suit of methods to enhance LLM's SQL bug-fixing abilities. The methods are mainly consisted of two parts: A Progressive Dataset Construction (PDC) from scratch and Dynamic Mask Supervised Fine-tuning (DM-SFT). PDC proposes two data expansion methods from the perspectives of breadth first and depth first respectively. DM-SFT introduces an efficient bug-fixing supervised learning approach, which effectively reduce the total training steps and mitigate the "disorientation" in SQL code bug-fixing training. In our evaluation, the code LLM models trained with two methods have exceeds all current best performing model which size is much larger.

IRSep 11, 2020
TRec: Sequential Recommender Based On Latent Item Trend Information

Ye Tao, Can Wang, Lina Yao et al.

Recommendation system plays an important role in online web applications. Sequential recommender further models user short-term preference through exploiting information from latest user-item interaction history. Most of the sequential recommendation methods neglect the importance of ever-changing item popularity. We propose the model from the intuition that items with most user interactions may be popular in the past but could go out of fashion in recent days. To this end, this paper proposes a novel sequential recommendation approach dubbed TRec, TRec learns item trend information from implicit user interaction history and incorporates item trend information into next item recommendation tasks. Then a self-attention mechanism is used to learn better node representation. Our model is trained via pairwise rank-based optimization. We conduct extensive experiments with seven baseline methods on four benchmark datasets, The empirical result shows our approach outperforms other stateof-the-art methods while maintains a superiorly low runtime cost. Our study demonstrates the importance of item trend information in recommendation system designs, and our method also possesses great efficiency which enables it to be practical in real-world scenarios.

IRMay 5, 2014
Attributes Coupling based Item Enhanced Matrix Factorization Technique for Recommender Systems

Yonghong Yu, Can Wang, Yang Gao

Recommender system has attracted lots of attentions since it helps users alleviate the information overload problem. Matrix factorization technique is one of the most widely employed collaborative filtering techniques in the research of recommender systems due to its effectiveness and efficiency in dealing with very large user-item rating matrices. Recently, based on the intuition that additional information provides useful insights for matrix factorization techniques, several recommendation algorithms have utilized additional information to improve the performance of matrix factorization methods. However, the majority focus on dealing with the cold start user problem and ignore the cold start item problem. In addition, there are few suitable similarity measures for these content enhanced matrix factorization approaches to compute the similarity between categorical items. In this paper, we propose attributes coupling based item enhanced matrix factorization method by incorporating item attribute information into matrix factorization technique as well as adapting the coupled object similarity to capture the relationship between items. Item attribute information is formed as an item relationship regularization term to regularize the process of matrix factorization. Specifically, the similarity between items is measured by the Coupled Object Similarity considering coupling between items. Experimental results on two real data sets show that our proposed method outperforms state-of-the-art recommendation algorithms and can effectively cope with the cold start item problem when more item attribute information is available.