Shan Zhong

LG
h-index33
9papers
315citations
Novelty51%
AI Score48

9 Papers

66.6NAMay 24
IV-Net: A neural network for elliptic PDEs with random and highly varying coefficients

Shan Zhong, George Biros

We introduce a novel neural operator architecture designed to approximate solutions of linear elliptic partial differential equations with high-contrast, spatially varying coefficients. The network, termed the Iterated V-shaped Net (IV-Net), realizes a mapping from the input coefficients and righthand side to the corresponding solution field. The architecture of IV-Net is informed by, and closely resembles, a V-cycle multigrid solver. The IV-Net model is parameterized via convolutional layers defined in the physical domain. For coercive problems with highly heterogeneous coefficients, the proposed network exhibits superior performance relative to a proper orthogonal decomposition (POD) approach and several existing neural operator architectures. For low-frequency oscillatory Helmholtz problems with smooth coefficients, its performance is similar to that of a Fourier neural operator. We analyze the approximation error and convergence behavior of IV-Net, its data efficiency, and its dependence on the underlying discretization mesh. Furthermore, we demonstrate the practical effectiveness of the architecture through a series of numerical experiments, including applications to uncertainty quantification, inverse problems, and prediction of quantities of interest.

IRAug 3, 2023
Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Prateek Verma, Shan Zhong, Xiaoyu Liu et al.

Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box. It is one of the key features of modern search engines specially in e-commerce. One of the goals of typeahead is to suggest relevant queries to users which are seasonally important. In this paper we propose a neural network based natural language processing (NLP) algorithm to incorporate seasonality as a signal and present end to end evaluation of the QAC ranking model. Incorporating seasonality into autocomplete ranking model can improve autocomplete relevance and business metric.

LGFeb 5
Distributional Reinforcement Learning with Diffusion Bridge Critics

Shutong Ding, Yimiao Zhou, Ke Hu et al.

Recent advances in diffusion-based reinforcement learning (RL) methods have demonstrated promising results in a wide range of continuous control tasks. However, existing works in this field focus on the application of diffusion policies while leaving the diffusion critics unexplored. In fact, since policy optimization fundamentally relies on the critic, accurate value estimation is far more important than policy expressiveness. Furthermore, given the stochasticity of most reinforcement learning tasks, it has been confirmed that the critic is more appropriately depicted with a distributional model. Motivated by these points, we propose a novel distributional RL method with Diffusion Bridge Critics (DBC). DBC directly models the inverse cumulative distribution function (CDF) of the Q value. This allows us to accurately capture the value distribution and prevents it from collapsing into a trivial Gaussian distribution owing to the strong distribution-matching capability of the diffusion bridge. Moreover, we further derive an analytic integral formula to address discretization errors in DBC, which is essential in value estimation. To our knowledge, DBC is the first work to employ the diffusion bridge model as the critic. Notably, DBC is also a plug-and-play component and can be integrated into most existing RL frameworks. Experimental results on MuJoCo robot control benchmarks demonstrate the superiority of DBC compared with previous distributional critic models.

LGDec 29, 2023
Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Xiao-Yang Liu, Rongyi Zhu, Daochen Zha et al.

The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

CLNov 9, 2024
Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Shan Zhong, Jiahao Zeng, Yongxin Yu et al.

This paper proposes a Clustering, Labeling, then Augmenting framework that significantly enhances performance in Semi-Supervised Text Classification (SSTC) tasks, effectively addressing the challenge of vast datasets with limited labeled examples. Unlike traditional SSTC approaches that rely on a predefined small set of labeled data to generate pseudo-labels for the unlabeled data, this framework innovatively employs clustering to select representative "landmarks" for labeling. These landmarks subsequently act as intermediaries in an ensemble of augmentation techniques, including Retrieval-Augmented Generation (RAG), Large Language Model (LLMs)-based rewriting, and synonym substitution, to generate synthetic labeled data without making pseudo-labels for the unlabeled data. Empirical results show that even in complex text document classification scenarios involving over 100 categories, our method achieves state-of-the-art accuracies of 95.41% on the Reuters dataset and 82.43% on the Web of Science dataset. Our approach significantly reduces the reliance on human labeling efforts and the associated expenses, while simultaneously ensuring high data quality and minimizing privacy risks. The finetuning results further show the efficiency of fine-tuning LLMs for text classification tasks, highlighting a robust solution for leveraging limited labeled data.

LGMay 24, 2025
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Shutong Ding, Ke Hu, Shan Zhong et al.

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings. GenPO introduces a novel doubled dummy action mechanism that enables invertibility via alternating updates, resolving log-likelihood computation barriers. Furthermore, we also use the action log-likelihood for unbiased entropy and KL divergence estimation, enabling KL-adaptive learning rates and entropy regularization in on-policy updates. Extensive experiments on eight IsaacLab benchmarks, including legged locomotion (Ant, Humanoid, Anymal-D, Unitree H1, Go2), dexterous manipulation (Shadow Hand), aerial control (Quadcopter), and robotic arm tasks (Franka), demonstrate GenPO's superiority over existing RL baselines. Notably, GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.

LGOct 26, 2025
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Shan Zhong, Shutong Ding, He Diao et al.

Reliable value estimation serves as the cornerstone of reinforcement learning (RL) by evaluating long-term returns and guiding policy improvement, significantly influencing the convergence speed and final performance. Existing works improve the reliability of value function estimation via multi-critic ensembles and distributional RL, yet the former merely combines multi point estimation without capturing distributional information, whereas the latter relies on discretization or quantile regression, limiting the expressiveness of complex value distributions. Inspired by flow matching's success in generative modeling, we propose a generative paradigm for value estimation, named FlowCritic. Departing from conventional regression for deterministic value prediction, FlowCritic leverages flow matching to model value distributions and generate samples for value estimation.

MLAug 24, 2021
S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Shan Zhong, David B. Hitchcock

We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

LGNov 19, 2018
Practical Deep Reinforcement Learning Approach for Stock Trading

Xiao-Yang Liu, Zhuoran Xiong, Shan Zhong et al.

Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.