Yun Lu

AI
h-index14
9papers
52citations
Novelty58%
AI Score53

9 Papers

IRMar 3
Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation

Chongjun Xia, Xiaoyu Shi, Hong Xie et al.

Item-side fairness is crucial for ensuring the fair exposure of long-tail items in interactive recommender systems. Existing approaches promote the exposure of long-tail items by directly incorporating them into recommended results. This causes misalignment between user preferences and the recommended long-tail items, which hinders long-term user engagement and reduces the effectiveness of recommendations. We aim for a proactive fairness-guiding strategy, which actively guides user preferences toward long-tail items while preserving user satisfaction during the interactive recommendation process. To this end, we propose HRL4PFG, an interactive recommendation framework that leverages hierarchical reinforcement learning to guide user preferences toward long-tail items progressively. HRL4PFG operates through a macro-level process that generates fairness-guided targets based on multi-step feedback, and a micro-level process that fine-tunes recommendations in real time according to both these targets and evolving user preferences. Extensive experiments show that HRL4PFG improves cumulative interaction rewards and maximum user interaction length by a larger margin when compared with state-of-the-art methods in interactive recommendation environments.

LGMar 4
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

Yun Lu, Xiaoyu Shi, Hong Xie et al.

Interactive recommender systems (IRS) are increasingly optimized with Reinforcement Learning (RL) to capture the sequential nature of user-system dynamics. However, existing fairness-aware methods often suffer from a fundamental oversight: they assume the observed user state is a faithful representation of true preferences. In reality, implicit feedback is contaminated by popularity-driven noise and exposure bias, creating a distorted state that misleads the RL agent. We argue that the persistent conflict between accuracy and fairness is not merely a reward-shaping issue, but a state estimation failure. In this work, we propose \textbf{DSRM-HRL}, a framework that reformulates fairness-aware recommendation as a latent state purification problem followed by decoupled hierarchical decision-making. We introduce a Denoising State Representation Module (DSRM) based on diffusion models to recover the low-entropy latent preference manifold from high-entropy, noisy interaction histories. Built upon this purified state, a Hierarchical Reinforcement Learning (HRL) agent is employed to decouple conflicting objectives: a high-level policy regulates long-term fairness trajectories, while a low-level policy optimizes short-term engagement under these dynamic constraints. Extensive experiments on high-fidelity simulators (KuaiRec, KuaiRand) demonstrate that DSRM-HRL effectively breaks the "rich-get-richer" feedback loop, achieving a superior Pareto frontier between recommendation utility and exposure equity.

ROMar 14
SmoothVLA: Aligning Vision-Language-Action Models with Physical Constraints via Intrinsic Smoothness Optimization

Jiashun Li, Xiaoyu Shi, Hong Xie et al.

Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robotic manipulation. However, existing post-training methods face a dilemma between stability and exploration: Supervised Fine-Tuning (SFT) is constrained by demonstration quality and lacks generalization, whereas Reinforcement Learning (RL) improves exploration but often induces erratic, jittery trajectories that violate physical constraints. To bridge this gap, we propose SmoothVLA, a novel reinforcement learning fine-tuning framework that synergistically optimizes task performance and motion smoothness. The technical core is a physics-informed hybrid reward function that integrates binary sparse task rewards with a continuous dense term derived from trajectory jerk. Crucially, this reward is intrinsic, that computing directly from policy rollouts, without requiring extrinsic environment feedback or laborious reward engineering. Leveraging the Group Relative Policy Optimization (GRPO), SmoothVLA establishes trajectory smoothness as an explicit optimization prior, guiding the model toward physically feasible and stable control. Extensive experiments on the LIBERO benchmark demonstrate that SmoothVLA outperforms standard RL by 13.8\% in smoothness and significantly surpasses SFT in generalization across diverse tasks. Our work offers a scalable approach to aligning VLA models with physical-world constraints through intrinsic reward optimization.

AINov 20, 2025
Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob

Yun Lu, Xiaoyu Shi, Hong Xie et al.

This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.

AISep 15, 2025
Empowering Clinical Trial Design through AI: A Randomized Evaluation of PowerGPT

Yiwen Lu, Lu Li, Dazheng Zhang et al.

Sample size calculations for power analysis are critical for clinical research and trial design, yet their complexity and reliance on statistical expertise create barriers for many researchers. We introduce PowerGPT, an AI-powered system integrating large language models (LLMs) with statistical engines to automate test selection and sample size estimation in trial design. In a randomized trial to evaluate its effectiveness, PowerGPT significantly improved task completion rates (99.3% vs. 88.9% for test selection, 99.3% vs. 77.8% for sample size calculation) and accuracy (94.1% vs. 55.4% in sample size estimation, p < 0.001), while reducing average completion time (4.0 vs. 9.3 minutes, p < 0.001). These gains were consistent across various statistical tests and benefited both statisticians and non-statisticians as well as bridging expertise gaps. Already under deployment across multiple institutions, PowerGPT represents a scalable AI-driven approach that enhances accessibility, efficiency, and accuracy in statistical power analysis for clinical research.

DLAug 21, 2025
Is Artificial Intelligence Reshaping the Landscape of the International Academic Community of Geosciences?

Liang Li, Yuntian Li, Wenxin Zhao et al.

Through bibliometric analysis and topic modeling, we find that artificial intelligence (AI) is positively transforming geosciences research, with a notable increase in AI-related scientific output in recent years. We are encouraged to observe that earth scientists from developing countries have gained better visibility in the recent AI for Science (AI4S) paradigm and that AI is also improving the landscape of international collaboration in geoscience-related research.

CRSep 3, 2023
The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning

Yun Lu, Malik Magdon-Ismail, Yu Wei et al.

Differential Privacy (DP) (and its variants) is the most common method for machine learning (ML) on privacy-sensitive data. In big data analytics, one often uses randomized sketching/aggregation algorithms to make processing high-dimensional data tractable. Intuitively, such ML algorithms should provide some inherent privacy, yet most existing DP mechanisms do not leverage or under-utilize this inherent randomness, resulting in potentially redundant noising. The motivating question of our work is: (How) can we improve the utility of DP mechanisms for randomized ML queries, by leveraging the randomness of the query itself? Towards a (positive) answer, our key contribution is (proving) what we call the NDIS theorem, a theoretical result with several practical implications. In a nutshell, NDIS is a closed-form analytic computation for the (varepsilon,delta)-indistinguishability-spectrum (IS) of two arbitrary normal distributions N1 and N2, i.e., the optimal delta (for any given varepsilon) such that N1 and N2 are (varepsilon,delta)-close according to the DP distance. The importance of the NDIS theorem lies in that (1) it yields efficient estimators for IS, and (2) it allows us to analyze DP-mechanism with normally-distributed outputs, as well as more general mechanisms by leveraging their behavior on large inputs. We apply the NDIS theorem to derive DP mechanisms for queries with normally-distributed outputs--i.e., Gaussian Random Projections (RP)--and for more general queries--i.e., Ordinary Least Squares (OLS). Compared to existing techniques, our new DP mechanisms achieve superior privacy/utility trade-offs by leveraging the randomness of the underlying algorithms. We then apply the NDIS theorem to a data-driven DP notion--in particular relative DP introduced by Lu et al. [S&P 2024]. Our method identifies the range of (varepsilon,delta) for which no additional noising is needed.

CVFeb 11, 2020
A Novel and Efficient Tumor Detection Framework for Pancreatic Cancer via CT Images

Zhengdong Zhang, Shuai Li, Ziyang Wang et al.

As Deep Convolutional Neural Networks (DCNNs) have shown robust performance and results in medical image analysis, a number of deep-learning-based tumor detection methods were developed in recent years. Nowadays, the automatic detection of pancreatic tumors using contrast-enhanced Computed Tomography (CT) is widely applied for the diagnosis and staging of pancreatic cancer. Traditional hand-crafted methods only extract low-level features. Normal convolutional neural networks, however, fail to make full use of effective context information, which causes inferior detection results. In this paper, a novel and efficient pancreatic tumor detection framework aiming at fully exploiting the context information at multiple scales is designed. More specifically, the contribution of the proposed method mainly consists of three components: Augmented Feature Pyramid networks, Self-adaptive Feature Fusion and a Dependencies Computation (DC) Module. A bottom-up path augmentation to fully extract and propagate low-level accurate localization information is established firstly. Then, the Self-adaptive Feature Fusion can encode much richer context information at multiple scales based on the proposed regions. Finally, the DC Module is specifically designed to capture the interaction information between proposals and surrounding tissues. Experimental results achieve competitive performance in detection with the AUC of 0.9455, which outperforms other state-of-the-art methods to our best of knowledge, demonstrating the proposed framework can detect the tumor of pancreatic cancer efficiently and accurately.

CRMay 15, 2018
How Private Are Commonly-Used Voting Rules?

Ao Liu, Yun Lu, Lirong Xia et al.

Differential privacy has been widely applied to provide privacy guarantees by adding random noise to the function output. However, it inevitably fails in many high-stakes voting scenarios, where voting rules are required to be deterministic. In this work, we present the first framework for answering the question: "How private are commonly-used voting rules?" Our answers are two-fold. First, we show that deterministic voting rules provide sufficient privacy in the sense of distributional differential privacy (DDP). We show that assuming the adversarial observer has uncertainty about individual votes, even publishing the histogram of votes achieves good DDP. Second, we introduce the notion of exact privacy to compare the privacy preserved in various commonly-studied voting rules, and obtain dichotomy theorems of exact DDP within a large subset of voting rules called generalized scoring rules.