Fuxin Jiang

LG
h-index11
9papers
126citations
Novelty54%
AI Score57

9 Papers

74.9AIMay 26
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Huawei Lin, Peng Li, Jie Song et al.

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.

CLMar 4, 2025Code
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale

Haoyang Li, Shang Wu, Xiaokang Zhang et al.

Text-to-SQL, the task of translating natural language questions into SQL queries, plays a crucial role in enabling non-experts to interact with databases. While recent advancements in large language models (LLMs) have significantly enhanced text-to-SQL performance, existing approaches face notable limitations in real-world text-to-SQL applications. Prompting-based methods often depend on closed-source LLMs, which are expensive, raise privacy concerns, and lack customization. Fine-tuning-based methods, on the other hand, suffer from poor generalizability due to the limited coverage of publicly available training data. To overcome these challenges, we propose a novel and scalable text-to-SQL data synthesis framework for automatically synthesizing large-scale, high-quality, and diverse datasets without extensive human intervention. Using this framework, we introduce SynSQL-2.5M, the first million-scale text-to-SQL dataset, containing 2.5 million samples spanning over 16,000 synthetic databases. Each sample includes a database, SQL query, natural language question, and chain-of-thought (CoT) solution. Leveraging SynSQL-2.5M, we develop OmniSQL, a powerful open-source text-to-SQL model available in three sizes: 7B, 14B, and 32B. Extensive evaluations across nine datasets demonstrate that OmniSQL achieves state-of-the-art performance, matching or surpassing leading closed-source and open-source LLMs, including GPT-4o and DeepSeek-V3, despite its smaller size. We release all code, datasets, and models to support further research.

LGNov 26, 2024Code
Disentangled Interpretable Representation for Efficient Long-term Time Series Forecasting

Yuang Zhao, Tianyu Li, Jiadong Chen et al.

Industry 5.0 introduces new challenges for Long-term Time Series Forecasting (LTSF), characterized by high-dimensional, high-resolution data and high-stakes application scenarios. Against this backdrop, developing efficient and interpretable models for LTSF becomes a key challenge. Existing deep learning and linear models often suffer from excessive parameter complexity and lack intuitive interpretability. To address these issues, we propose DiPE-Linear, a Disentangled interpretable Parameter-Efficient Linear network. DiPE-Linear incorporates three temporal components: Static Frequential Attention (SFA), Static Temporal Attention (STA), and Independent Frequential Mapping (IFM). These components alternate between learning in the frequency and time domains to achieve disentangled interpretability. The decomposed model structure reduces parameter complexity from quadratic in fully connected networks (FCs) to linear and computational complexity from quadratic to log-linear. Additionally, a Low-Rank Weight Sharing policy enhances the model's ability to handle multivariate series. Despite operating within a subspace of FCs with limited expressive capacity, DiPE-Linear demonstrates comparable or superior performance to both FCs and nonlinear models across multiple open-source and real-world LTSF datasets, validating the effectiveness of its sophisticatedly designed structure. The combination of efficiency, accuracy, and interpretability makes DiPE-Linear a strong candidate for advancing LTSF in both research and real-world applications. The source code is available at https://github.com/wintertee/DiPE-Linear.

83.9DBMar 30
Can Large Language Models be a Cardinality Estimator? An Empirical study

Liangzu Liu, Yiyan Wang, Yinjun Wu et al.

Cardinality estimation (CardEst) still remains a challenging problem for DBMS. Recent years have witnessed the success of ML-based cardinality estimators in outperforming traditional methods. However, these solutions suffer from poor generalizability to new data or query distribution, inability to handle complex queries, and substantial data preparation overhead, thus preventing their wide adoption in the real-world DBMS. Some recent efforts have been dedicated to addressing some but not all of these issues. We notice that the recent emerging Large Language Models (LLMs) have shown their remarkable generalizability to unseen tasks, capabilities to understand complex programs, and power to perform data-efficient fine-tuning. In light of this, we propose to leverage LLMs to mitigate the above issues. Specifically, we carefully craft prompts, and subsequently perform fine-tuning and self-correction during inference with LLMs for CardEst task. We then extensively evaluate LLMs' in-distribution and out-of-distribution generalizability, feasibility to support complex queries, and training data efficiency during fine-tuning LLMs on pre-training datasets. The results suggest that LLMs outperform the state-of-the-art in almost all settings, thus indicating their potential for the CardEst task. We further measure the end-to-end query execution time in DBMS by using the estimated cardinalities of LLMs in some practical settings, which suggests that the inference overhead of LLMs can be outweighed by the benefits brought by LLMs for CardEst.

LGJul 17, 2025Code
Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services

Jiadong Chen, Hengyu Ye, Fuxin Jiang et al.

Workload forecasting is pivotal in cloud service applications, such as auto-scaling and scheduling, with profound implications for operational efficiency. Although Transformer-based forecasting models have demonstrated remarkable success in general tasks, their computational efficiency often falls short of the stringent requirements in large-scale cloud environments. Given that most workload series exhibit complicated periodic patterns, addressing these challenges in the frequency domain offers substantial advantages. To this end, we propose Fremer, an efficient and effective deep forecasting model. Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models; it achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting; and it exhibits robust performance for multi-period series. Furthermore, we collect and open-source four high-quality, open-source workload datasets derived from ByteDance's cloud services, encompassing workload data from thousands of computing instances. Extensive experiments on both our proprietary datasets and public benchmarks demonstrate that Fremer consistently outperforms baseline models, achieving average improvements of 5.5% in MSE, 4.7% in MAE, and 8.6% in SMAPE over SOTA models, while simultaneously reducing parameter scale and computational costs. Additionally, in a proactive auto-scaling test based on Kubernetes, Fremer improves average latency by 18.78% and reduces resource consumption by 2.35%, underscoring its practical efficacy in real-world applications.

LGApr 8, 2024
ATFNet: Adaptive Time-Frequency Ensembled Network for Long-term Time Series Forecasting

Hengyu Ye, Jiadong Chen, Shijin Gong et al.

The intricate nature of time series data analysis benefits greatly from the distinct advantages offered by time and frequency domain representations. While the time domain is superior in representing local dependencies, particularly in non-periodic series, the frequency domain excels in capturing global dependencies, making it ideal for series with evident periodic patterns. To capitalize on both of these strengths, we propose ATFNet, an innovative framework that combines a time domain module and a frequency domain module to concurrently capture local and global dependencies in time series data. Specifically, we introduce Dominant Harmonic Series Energy Weighting, a novel mechanism for dynamically adjusting the weights between the two modules based on the periodicity of the input time series. In the frequency domain module, we enhance the traditional Discrete Fourier Transform (DFT) with our Extended DFT, designed to address the challenge of discrete frequency misalignment. Additionally, our Complex-valued Spectrum Attention mechanism offers a novel approach to discern the intricate relationships between different frequency combinations. Extensive experiments across multiple real-world datasets demonstrate that our ATFNet framework outperforms current state-of-the-art methods in long-term time series forecasting.

AIFeb 1
Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning

Yu Li, Mingyang Yi, Xiuyu Li et al.

Agentic Reinforcement Learning (ARL) focuses on training large language models (LLMs) to interleave reasoning with external tool execution to solve complex tasks. Most existing ARL methods train a single shared model parameters to support both reasoning and tool use behaviors, implicitly assuming that joint training leads to improved overall agent performance. Despite its widespread adoption, this assumption has rarely been examined empirically. In this paper, we systematically investigate this assumption by introducing a Linear Effect Attribution System(LEAS), which provides quantitative evidence of interference between reasoning and tool-use behaviors. Through an in-depth analysis, we show that these two capabilities often induce misaligned gradient directions, leading to training interference that undermines the effectiveness of joint optimization and challenges the prevailing ARL paradigm. To address this issue, we propose Disentangled Action Reasoning Tuning(DART), a simple and efficient framework that explicitly decouples parameter updates for reasoning and tool-use via separate low-rank adaptation modules. Experimental results show that DART consistently outperforms baseline methods with averaged 6.35 percent improvements and achieves performance comparable to multi-agent systems that explicitly separate tool-use and reasoning using a single model.

LGAug 18, 2025
Online Ensemble Transformer for Accurate Cloud Workload Forecasting in Predictive Auto-Scaling

Jiadong Chen, Xiao He, Hengyu Ye et al.

In the swiftly evolving domain of cloud computing, the advent of serverless systems underscores the crucial need for predictive auto-scaling systems. This necessity arises to ensure optimal resource allocation and maintain operational efficiency in inherently volatile environments. At the core of a predictive auto-scaling system is the workload forecasting model. Existing forecasting models struggle to quickly adapt to the dynamics in online workload streams and have difficulty capturing the complex periodicity brought by fine-grained, high-frequency forecasting tasks. Addressing this, we propose a novel online ensemble model, E3Former, for online workload forecasting in large-scale predictive auto-scaling. Our model synergizes the predictive capabilities of multiple subnetworks to surmount the limitations of single-model approaches, thus ensuring superior accuracy and robustness. Remarkably, it accomplishes this with a minimal increase in computational overhead, adhering to the lean operational ethos of serverless systems. Through extensive experimentation on real-world workload datasets, we establish the efficacy of our ensemble model. In online forecasting tasks, the proposed method reduces forecast error by an average of 10%, and its effectiveness is further demonstrated through a predictive auto-scaling test in the real-life online system. Currently, our method has been deployed within ByteDance's Intelligent Horizontal Pod Auto-scaling (IHPA) platform, which supports the stable operation of over 30 applications, such as Douyin E-Comerce, TouTiao, and Volcano Engine. The predictive auto-scaling capacity reaching over 600,000 CPU cores. On the basis of essentially ensuring service quality, the predictive auto-scaling system can reduce resource utilization by over 40%.

SPDec 7, 2020
A Novel Hybrid Framework for Hourly PM2.5 Concentration Forecasting Using CEEMDAN and Deep Temporal Convolutional Neural Network

Fuxin Jiang, Chengyuan Zhang, Shaolong Sun et al.

For hourly PM2.5 concentration prediction, accurately capturing the data patterns of external factors that affect PM2.5 concentration changes, and constructing a forecasting model is one of efficient means to improve forecasting accuracy. In this study, a novel hybrid forecasting model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and deep temporal convolutional neural network (DeepTCN) is developed to predict PM2.5 concentration, by modelling the data patterns of historical pollutant concentrations data, meteorological data, and discrete time variables' data. Taking PM2.5 concentration of Beijing as the sample, experimental results showed that the forecasting accuracy of the proposed CEEMDAN-DeepTCN model is verified to be the highest when compared with the time series model, artificial neural network, and the popular deep learning models. The new model has improved the capability to model the PM2.5-related factor data patterns, and can be used as a promising tool for forecasting PM2.5 concentrations.