Wenqi Cai

LG
h-index9
7papers
42citations
Novelty43%
AI Score49

7 Papers

LGMar 25, 2022
Quasi-Newton Iteration in Deterministic Policy Gradient

Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Wenqi Cai et al.

This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example.

ROApr 13
Safe Human-to-Humanoid Motion Imitation Using Control Barrier Functions

Wenqi Cai, John Abanes, Nikolaos Evangeliou et al.

Ensuring operational safety is critical for human-to-humanoid motion imitation. This paper presents a vision-based framework that enables a humanoid robot to imitate human movements while avoiding collisions. Human skeletal keypoints are captured by a single camera and converted into joint angles for motion retargeting. Safety is enforced through a Control Barrier Function (CBF) layer formulated as a Quadratic Program (QP), which filters imitation commands to prevent both self-collisions and human-robot collisions. Simulation results validate the effectiveness of the proposed framework for real-time collision-aware motion imitation.

ROMar 30
Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion

Wenqi Cai, Kyriakos G. Vamvoudakis, Sébastien Gros et al.

In this paper, we propose a cost-matching approach for optimal humanoid locomotion within a Model Predictive Control (MPC)-based Reinforcement Learning (RL) framework. A parameterized MPC formulation with centroidal dynamics is trained to approximate the action-value function obtained from high-fidelity closed-loop data. Specifically, the MPC cost-to-go is evaluated along recorded state-action trajectories, and the parameters are updated to minimize the discrepancy between MPC-predicted values and measured returns. This formulation enables efficient gradient-based learning while avoiding the computational burden of repeatedly solving the MPC problem during training. The proposed method is validated in simulation using a commercial humanoid platform. Results demonstrate improved locomotion performance and robustness to model mismatch and external disturbances compared with manually tuned baselines.

CVMar 8Code
EVLF: Early Vision-Language Fusion for Generative Dataset Distillation

Wenqi Cai, Yawen Zou, Guang Li et al.

Dataset distillation (DD) aims to synthesize compact training sets that enable models to achieve high accuracy with significantly fewer samples. Recent diffusion-based DD methods commonly introduce semantic guidance through late-stage cross-attention, where textual prompts tend to dominate the generative process. Although this strategy enforces label relevance, it diminishes the contribution of visual latents, resulting in over-corrected samples that mirror prompt patterns rather than reflecting intrinsic visual features. To solve this problem, we introduce an Early Vision-Language Fusion (EVLF) method that aligns textual and visual embeddings at the transition between the encoder and the generative backbone. By incorporating a lightweight cross-attention module at this transition, the early representations simultaneously encode local textures and global semantic directions across the denoising process. Importantly, EVLF is plug-and-play and can be easily integrated into any diffusion-based dataset distillation pipeline with an encoder. It works across different denoiser architectures and sampling schedules without any task-specific modifications. Extensive experiments demonstrate that EVLF generates semantically faithful and visually coherent synthetic data, yielding consistent improvements in downstream classification accuracy across varied settings. Source code is available at https://github.com/wenqi-cai297/earlyfusion-for-dd/.

CEMay 7
Arbitrage and the Stability of AMM Price Tracking

Peihao Li, Nadia Dahmani, Wenqi Cai

Automated market makers (AMMs) quote prices from pool state rather than from a limit order book. AMM pools often stay close to a reference price because arbitrageurs correct profitable mispricing. A large part of decentralized finance therefore relies on a simple economic premise: once the AMM price drifts away from the reference price, arbitrage incentives push it back. This paper studies when that premise is strong enough to guarantee block-scale stability. We model the gap between the reference price and the AMM price as a stochastic tracking error, treat arbitrage as the corrective input, and place blockchain execution inside the loop through fees, discrete blocks, transaction ordering, delays, and transaction failure. The detailed execution layer is reduced to the total successful correction confirmed in each block. Under a block-level correction condition, we prove geometric ergodicity of the tracking error and obtain explicit one-step bounds that connect tracking quality to liquidity and execution quality. We also show in a constant-product example how fees, fixed execution costs, and local liquidity map into the no-trade band and the optimal corrective trade. Finally, we build empirical proxies for the theorem quantities from realized block data and use them to organize reduced and mechanism-focused simulations whose comparative statics are consistent with the theory. The contribution is to turn a basic economic intuition behind decentralized finance into a quantitative stability statement together with a tractable calibration interface.

CYMar 25, 2024
Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography

Jiayue Zhang, Yiheng Liu, Wenqi Cai et al.

In recent years, the rapid development of artificial intelligence technology, especially the emergence of large language models (LLMs) such as ChatGPT, has presented significant prospects for application in the field of education. LLMs possess the capability to interpret knowledge, answer questions, and consider context, thus providing support for dialogic teaching to students. Therefore, an examination of the capacity of LLMs to effectively fulfill instructional roles, thereby facilitating student learning akin to human educators within dialogic teaching scenarios, is an exceptionally valuable research topic. This research recruited 34 undergraduate students as participants, who were randomly divided into two groups. The experimental group engaged in dialogic teaching using ChatGPT, while the control group interacted with human teachers. Both groups learned the histogram equalization unit in the information-related course "Digital Image Processing". The research findings show comparable scores between the two groups on the retention test. However, students who engaged in dialogue with ChatGPT exhibited lower performance on the transfer test. Electroencephalography data revealed that students who interacted with ChatGPT exhibited higher levels of cognitive activity, suggesting that ChatGPT could help students establish a knowledge foundation and stimulate cognitive activity. However, its strengths on promoting students. knowledge application and creativity were insignificant. Based upon the research findings, it is evident that ChatGPT cannot fully excel in fulfilling teaching tasks in the dialogue teaching in information related courses. Combining ChatGPT with traditional human teachers might be a more ideal approach. The synergistic use of both can provide students with more comprehensive learning support, thus contributing to enhancing the quality of teaching.

LGApr 6, 2021
MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

Arash Bahari Kordabad, Wenqi Cai, Sebastien Gros

In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.