Zhi Luo

LG
h-index27
5papers
7citations
Novelty54%
AI Score38

5 Papers

LGNov 20, 2024
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning

Zhi Luo, Xiyuan Yang, Pan Zhou et al.

Manipulating the interaction trajectories between the intelligent agent and the environment can control the agent's training and behavior, exposing the potential vulnerabilities of reinforcement learning (RL). For example, in Cyber-Physical Systems (CPS) controlled by RL, the attacker can manipulate the actions of the adopted RL to other actions during the training phase, which will lead to bad consequences. Existing work has studied action-manipulation attacks in tabular settings, where the states and actions are discrete. As seen in many up-and-coming RL applications, such as autonomous driving, continuous action space is widely accepted, however, its action-manipulation attacks have not been thoroughly investigated yet. In this paper, we consider this crucial problem in both white-box and black-box scenarios. Specifically, utilizing the knowledge derived exclusively from trajectories, we propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. Additionally, we demonstrate that for an agent whose dynamic regret is sub-linearly related to the total number of steps, LCBT can teach the agent to converge to target policies with only sublinear attack cost, i.e., $O\left(\mathcal{R}(T) + MH^3K^E\log (MT)\right)(0<E<1)$, where $H$ is the number of steps per episode, $K$ is the total number of episodes, $T=KH$ is the total number of steps, $M$ is the number of subspaces divided in the state space, and $\mathcal{R}(T)$ is the bound of the RL algorithm's regret. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.

CVNov 20, 2025
An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

Zhi Luo, Zenghui Yuan, Wenqi Wei et al.

With the remarkable success of Vision-Language Models (VLMs) on multimodal tasks, concerns regarding their deployment efficiency have become increasingly prominent. In particular, the number of tokens consumed during the generation process has emerged as a key evaluation metric.Prior studies have shown that specific inputs can induce VLMs to generate lengthy outputs with low information density, which significantly increases energy consumption, latency, and token costs. However, existing methods simply delay the occurrence of the EOS token to implicitly prolong output, and fail to directly maximize the output token length as an explicit optimization objective, lacking stability and controllability.To address these limitations, this paper proposes a novel verbose-text induction attack (VTIA) to inject imperceptible adversarial perturbations into benign images via a two-stage framework, which identifies the most malicious prompt embeddings for optimizing and maximizing the output token of the perturbed images.Specifically, we first perform adversarial prompt search, employing reinforcement learning strategies to automatically identify adversarial prompts capable of inducing the LLM component within VLMs to produce verbose outputs. We then conduct vision-aligned perturbation optimization to craft adversarial examples on input images, maximizing the similarity between the perturbed image's visual embeddings and those of the adversarial prompt, thereby constructing malicious images that trigger verbose text generation. Comprehensive experiments on four popular VLMs demonstrate that our method achieves significant advantages in terms of effectiveness, efficiency, and generalization capability.

LGSep 7, 2025
PolicyEvolve: Evolving Programmatic Policies by LLMs for multi-player games via Population-Based Training

Mingrui Lv, Hangzhi Liu, Zhi Luo et al.

Multi-agent reinforcement learning (MARL) has achieved significant progress in solving complex multi-player games through self-play. However, training effective adversarial policies requires millions of experience samples and substantial computational resources. Moreover, these policies lack interpretability, hindering their practical deployment. Recently, researchers have successfully leveraged Large Language Models (LLMs) to generate programmatic policies for single-agent tasks, transforming neural network-based policies into interpretable rule-based code with high execution efficiency. Inspired by this, we propose PolicyEvolve, a general framework for generating programmatic policies in multi-player games. PolicyEvolve significantly reduces reliance on manually crafted policy code, achieving high-performance policies with minimal environmental interactions. The framework comprises four modules: Global Pool, Local Pool, Policy Planner, and Trajectory Critic. The Global Pool preserves elite policies accumulated during iterative training. The Local Pool stores temporary policies for the current iteration; only sufficiently high-performing policies from this pool are promoted to the Global Pool. The Policy Planner serves as the core policy generation module. It samples the top three policies from the Global Pool, generates an initial policy for the current iteration based on environmental information, and refines this policy using feedback from the Trajectory Critic. Refined policies are then deposited into the Local Pool. This iterative process continues until the policy achieves a sufficiently high average win rate against the Global Pool, at which point it is integrated into the Global Pool. The Trajectory Critic analyzes interaction data from the current policy, identifies vulnerabilities, and proposes directional improvements to guide the Policy Planner

SPMar 16, 2020
Inverse design of multilayer nanoparticles using artificial neural networks and genetic algorithm

Cankun Qiu, Zhi Luo, Xia Wu et al.

The light scattering of multilayer nanoparticles can be solved by Maxwell equations. However, it is difficult to solve the inverse design of multilayer nanoparticles by using the traditional trial-and-error method. Here, we present a method for forward simulation and inverse design of multilayer nanoparticles. We combine the global search ability of genetic algorithm with the local search ability of neural network. First, the genetic algorithm is used to find a suitable solution, and then the neural network is used to fine-tune it. Due to the non-unique relationship between physical structures and optical responses, we first train a forward neural network, and then it is applied to the inverse design of multilayer nanoparticles. Not only here, this method can easily be extended to predict and find the best design parameters for other optical structures.

IVMar 16, 2020
u-net CNN based fourier ptychography

Yican Chen, Zhi Luo, Xia Wu et al.

Fourier ptychography is a recently explored imaging method for overcoming the diffraction limit of conventional cameras with applications in microscopy and yielding high-resolution images. In order to splice together low-resolution images taken under different illumination angles of coherent light source, an iterative phase retrieval algorithm is adopted. However, the reconstruction procedure is slow and needs a good many of overlap in the Fourier domain for the continuous recorded low-resolution images and is also worse under system aberrations such as noise or random update sequence. In this paper, we propose a new retrieval algorithm that is based on convolutional neural networks. Once well trained, our model can perform high-quality reconstruction rapidly by using the graphics processing unit. The experiments demonstrate that our model achieves better reconstruction results and is more robust under system aberrations.