Ziqian Liu

CL
h-index1
9papers
48citations
Novelty54%
AI Score57

9 Papers

CVApr 6Code
MIRAGE: Benchmarking and Aligning Multi-Instance Image Editing

Ziqian Liu, Stephan Alaniz

Instruction-guided image editing has seen remarkable progress with models like FLUX.2 and Qwen-Image-Edit, yet they still struggle with complex scenarios with multiple similar instances each requiring individual edits. We observe that state-of-the-art models suffer from severe over-editing and spatial misalignment when faced with multiple identical instances and composite instructions. To this end, we introduce a comprehensive benchmark specifically designed to evaluate fine-grained consistency in multi-instance and multi-instruction settings. To address the failures of existing methods observed in our benchmark, we propose Multi-Instance Regional Alignment via Guided Editing (MIRAGE), a training-free framework that enables precise, localized editing. By leveraging a vision-language model to parse complex instructions into regional subsets, MIRAGE employs a multi-branch parallel denoising strategy. This approach injects latent representations of target regions into the global representation space while maintaining background integrity through a reference trajectory. Extensive evaluations on MIRA-Bench and RefEdit-Bench demonstrate that our framework significantly outperforms existing methods in achieving precise instance-level modifications while preserving background consistency. Our benchmark and code are available at https://github.com/ZiqianLiu666/MIRAGE.

SYMay 20
DAE-Embedded Neural Control Verification for Shipboard Microgrids under Transient Shocks

Fei Feng, Lizhi Wang, Ziqian Liu

Neural control offers strong potential for handling highly nonlinear dynamics in shipboard microgrids (SMGs), yet its black-box nature can trigger abrupt control spikes and actuator saturation during initial transient shocks. This letter devises a formal verification method for SMG neural controller to assess its shock responses. Our contributions include: 1) a set-based SMG differential-algebraic equation(DAE) model compatible with set propagation; 2) a DAE-embedded bound propagation approach to compute tight envelopes of all possible neural control output. Extensive case studies demonstrate the effectiveness of the devised method in formally certifying SMG control performance under uncertain disturbances.

CLAug 7, 2025Code
TASE: Token Awareness and Structured Evaluation for Multilingual Language Models

Chenzhuo Zhao, Xinda Wang, Yue Huang et al.

While large language models (LLMs) have demonstrated remarkable performance on high-level semantic tasks, they often struggle with fine-grained, token-level understanding and structural reasoning--capabilities that are essential for applications requiring precision and control. We introduce TASE, a comprehensive benchmark designed to evaluate LLMs' ability to perceive and reason about token-level information across languages. TASE covers 10 tasks under two core categories: token awareness and structural understanding, spanning Chinese, English, and Korean, with a 35,927-instance evaluation set and a scalable synthetic data generation pipeline for training. Tasks include character counting, token alignment, syntactic structure parsing, and length constraint satisfaction. We evaluate over 30 leading commercial and open-source LLMs, including O3, Claude 4, Gemini 2.5 Pro, and DeepSeek-R1, and train a custom Qwen2.5-14B model using the GRPO training method. Results show that human performance significantly outpaces current LLMs, revealing persistent weaknesses in token-level reasoning. TASE sheds light on these limitations and provides a new diagnostic lens for future improvements in low-level language understanding and cross-lingual generalization. Our code and dataset are publicly available at https://github.com/cyzcz/Tase .

ARMar 10
Pooling Engram Conditional Memory in Large Language Models using CXL

Ruiyang Ma, Teng Ma, Zhiyuan Su et al.

Engram conditional memory has emerged as a promising component for LLMs by decoupling static knowledge lookup from dynamic computation. Since Engram exhibits sparse access patterns and supports prefetching, its massive embedding tables are well-suited for offloading to lower-tier memory. In this paper, we propose using Compute Express Link (CXL) memory pool for Engram storage. Compared to RDMA, CXL provides fine-grained and low-latency access required by minimal and discrete retrieval patterns of Engram. We integrate the CXL-based Engram pool into SGLang, achieving near-DRAM end-to-end performance. This provides a scalable and cost-efficient storage solution for future Engram-integrated LLMs without compromising inference performance.

LGAug 7, 2024
PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

Haoran Xu, Ziqian Liu, Rong Fu et al.

With the evolution of large language models, traditional Transformer models become computationally demanding for lengthy sequences due to the quadratic growth in computation with respect to the sequence length. Mamba, emerging as a groundbreaking architecture in the field of generative AI, demonstrates remarkable proficiency in handling elongated sequences with reduced computational and memory complexity. Nevertheless, the existing training framework of Mamba presents inefficiency with variable-length sequence inputs. Either single-sequence training results in low GPU utilization, or batched processing of variable-length sequences to a maximum length incurs considerable memory and computational overhead. To address this problem, we analyze the performance of bottleneck operators in Mamba under diverse tensor shapes and proposed PackMamba, a high-throughput Mamba that efficiently handles variable-length sequences. Diving deep into state-space models (SSMs), we modify the parallel operators to avoid passing information between individual sequences while maintaining high performance. Experimental results on an NVIDIA A100 GPU demonstrate throughput exceeding the baseline single-sequence processing scheme: 3.06x speedup on the 1.4B model and 2.62x on the 2.8B model.

NIMay 4
IteRate: Autonomous AI Synthesis of In-Kernel eBPF Wi-Fi Rate Control Algorithms

James Lynch, Ziqian Liu, Snehadeep Gayen et al.

Wi-Fi rate adaptation remains a persistent challenge in wireless networking. Deployed algorithms like Minstrel-HT have remained largely stagnant for over a decade, relying on hand-tuned heuristics that fail to generalize to the complexity of modern wireless environments. We present \name, an autonomous research system that closes the loop on rate control development. IteRate uses a multi-agent AI architecture to conduct the full scientific cycle: formulating hypotheses, writing eBPF programs that run inside the Linux kernel, deploying them over-the-air to Wi-Fi devices, collecting fine-grained telemetry for analysis, and iterating based on experimental evidence, all without human intervention. IteRate makes three contributions. (1) a novel kernel module that exposes per-frame hardware telemetry including modulation and coding schemes (MCS) and retry counts to eBPF programs, (2) a structured agentic AI architecture employing specialized agents for algorithm design, experiment execution, and data analysis, coordinated via a hypothesis-driven research protocol with persistent knowledge, and (3) a closed-loop pipeline that automates the cross-compilation, deployment, and evaluation of in-kernel logic onto embedded Wi-Fi targets. On a 58-node testbed running five workloads. relative to the well-known Minstrel algorithm, IteRate achieves 21% faster web-page loads, 7% higher video quality of experience (QoE), and 21% higher peak throughput. Our work demonstrates that AI agents, when equipped with appropriate kernel-level hooks and a disciplined scientific workflow, can effectively automate the research required to design Wi-Fi rate controllers.

CLMay 22, 2025
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models

Chenzhuo Zhao, Ziqian Liu, Xinda Wang et al.

Prompt optimization is a practical and widely applicable alternative to fine tuning for improving large language model performance. Yet many existing methods evaluate candidate prompts by sampling full outputs, often coupled with self critique or human annotated preferences, which limits scalability, especially for smaller models or models that are not instruction tuned. We present PMPO (Probabilistic Metric Prompt Optimization), a unified framework that uses token level cross entropy as a direct, lightweight evaluation signal. PMPO locates low quality prompt segments via a masking based analysis and iteratively rewrites them to propose improved variants. Crucially, during evaluation, PMPO selects among variants by minimizing loss in a single forward pass, eliminating output sampling and human or judge based scoring for selection while still using standard generation only to propose rewrites. This unified, loss based strategy supports both supervised and preference based tasks. Across model sizes and datasets, PMPO outperforms prior prompt optimizers: it achieves the highest average accuracy on BBH, performs strongly on GSM8K and AQUA RAT, and raises AlpacaEval 2.0 win rates by over 19 points. These results demonstrate PMPO's effectiveness, efficiency, and broad applicability.

CVDec 14, 2025
Learning Common and Salient Generative Factors Between Two Image Datasets

Yunlong He, Gwilherm Lesné, Ziqian Liu et al.

Recent advancements in image synthesis have enabled high-quality image generation and manipulation. Most works focus on: 1) conditional manipulation, where an image is modified conditioned on a given attribute, or 2) disentangled representation learning, where each latent direction should represent a distinct semantic attribute. In this paper, we focus on a different and less studied research problem, called Contrastive Analysis (CA). Given two image datasets, we want to separate the common generative factors, shared across the two datasets, from the salient ones, specific to only one dataset. Compared to existing methods, which use attributes as supervised signals for editing (e.g., glasses, gender), the proposed method is weaker, since it only uses the dataset signal. We propose a novel framework for CA, that can be adapted to both GAN and Diffusion models, to learn both common and salient factors. By defining new and well-adapted learning strategies and losses, we ensure a relevant separation between common and salient factors, preserving a high-quality generation. We evaluate our approach on diverse datasets, covering human faces, animal images and medical scans. Our framework demonstrates superior separation ability and image quality synthesis compared to prior methods.

SEJun 3, 2024
VerilogReader: LLM-Aided Hardware Test Generation

Ruiyang Ma, Yuxin Yang, Ziqian Liu et al.

Test generation has been a critical and labor-intensive process in hardware design verification. Recently, the emergence of Large Language Model (LLM) with their advanced understanding and inference capabilities, has introduced a novel approach. In this work, we investigate the integration of LLM into the Coverage Directed Test Generation (CDG) process, where the LLM functions as a Verilog Reader. It accurately grasps the code logic, thereby generating stimuli that can reach unexplored code branches. We compare our framework with random testing, using our self-designed Verilog benchmark suite. Experiments demonstrate that our framework outperforms random testing on designs within the LLM's comprehension scope. Our work also proposes prompt engineering optimizations to augment LLM's understanding scope and accuracy.