Ruiqi Zhu

RO
h-index3
9papers
37citations
Novelty58%
AI Score48

9 Papers

ROJul 6, 2023Code
Learning to Solve Tasks with Exploring Prior Behaviours

Ruiqi Zhu, Siyuan Li, Tianhong Dai et al.

Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.

AIJul 9, 2024
CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

Yangmin Li, Ruiqi Zhu, Wengen Li

Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions and benefits a variety of applications. Existing multimodal sentiment analysis methods can be classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve bad performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses state-of-the-art multimodal sentiment analysis methods.

ROApr 11, 2022
Deep Reinforcement Learning Based Semi-Autonomous Control for Robotic Surgery

Ruiqi Zhu, Dandan Zhang, Benny Lo

In recent decades, the tremendous benefits surgical robots have brought to surgeons and patients have been witnessed. With the dexterous operation and the great precision, surgical robots can offer patients less recovery time and less hospital stay. However, the controls for current surgical robots in practical usage are fully carried out by surgeons via teleoperation. During the surgery process, there exists a lot of repetitive but simple manipulation, which can cause unnecessary fatigue to the surgeons. In this paper, we proposed a deep reinforcement learning-based semi-autonomous control framework for robotic surgery. The user study showed that the framework can reduce the completion time by 19.1% and the travel length by 58.7%.

ITMay 13
List Decoding of Reed-Solomon Codes and Folded Reed-Solomon Codes Over Galois Ring

Chen Yuan, Ruiqi Zhu

List decoding of codes can be seen as the generalization of unique decoding of codes While list decoding over finite fields has been extensively studied, extending these results to more general algebraic structures such as Galois rings remains an important challenge. Due to recent progress in zero knowledge systems, there is a growing demand to investigate the proximity gap of codes over Galois rings in Yizhou Yao and coauthors(2025), Alexander Golovne and coauthors(2023), Yuanju Wei and coauthors(2025). The proximity gap is closely related to the decoding capability of codes. It was shown in Eli Ben-Sasson and coauthors(2020) that the proximity gap for RS codes over finite field can be improved to $1-\sqrt{r}$ if one consider list decoding instead of unique decoding. However, we know very little about RS codes over Galois ring which might hinder the development of zero knowledge proof system for ring-based arithmetic circuit. In this work, we first extend the list decoding procedure of Guruswami and Sudan to Reed-Solomon codes over Galois rings, which shows that RS codes with rate $r$ can be list decoded up to radius $1-\sqrt{r}$. Then, we investigate the list decoding of folded Reed-Solomon codes over Galois rings. We show that the list decoding radius of folded Reed-Solomon codes can reach the Singlton bound as its counterpart over finite field. Finally, we improve the list size of our folded Reed-Solomon code to $O(\frac{1}{\varepsilon^2})$ by extending recent work in Shashank Srivastava(2025) to Galois Rings. Moreover, at the combinatorial level, by developing the recent work of Yeyuan Chen and Zihan Zhang(2025), we show that folded Reed-Solomon codes over Galois rings satisfy the relaxed generalized Singleton bound in the average-radius sense with optimal list size $O(1/\varepsilon)$.

ITMay 8
A Syndrome-Space Approach to Proximity Gaps and Correlated Agreement for Random Linear Codes

Chen Yuan, Ruiqi Zhu

Proximity gaps and correlated agreement have become central tools in the analysis of interactive oracle proofs of proximity (IOPPs) and code-based SNARKs. Informally, a proximity-gap statement says that for a structured set of words -- such as a line, an affine space, or a curve -- either all points are close to the code, or most are far from it. Such statements are essential in sampling-based proof systems, where a verifier queries only a few random locations on a structured object but must still obtain a global soundness guarantee. In Reed--Solomon-based proof systems, one would ideally like the proximity parameter to approach the information-theoretic limit $1-R$, since this is the largest possible radius for a rate-$R$ code and directly affects protocol efficiency. While recent work has substantially strengthened the picture for algebraic codes and linked proximity gaps to decoding-related structural properties, it remains unclear whether analogous results for random linear codes can be proved directly, rather than through decoding-theoretic surrogates. In this work, we establish a direct approach to proximity gaps and correlated agreement for random linear codes in the random parity-check-matrix model, without relying on list decoding as the main engine of the proof. Our approach is based on a syndrome-space reformulation together with a witness-based reduction mechanism, and it yields strong results for affine lines, affine spaces, and polynomial curves. It is conceptually different from the existing decoding-driven route for random linear codes, and it also leads to sharper parameters, including the optimal-up-to-$\varepsilon$ large-alphabet radius bound $ρ<1-R-\varepsilon$ for $q=Θ(n)$, as well as near-capacity bounds over constant alphabets with improved alphabet-size requirements.

ROMar 4, 2024
Cross Domain Policy Transfer with Effect Cycle-Consistency

Ruiqi Zhu, Tianhong Dai, Oya Celiktutan

Training a robotic policy from scratch using deep reinforcement learning methods can be prohibitively expensive due to sample inefficiency. To address this challenge, transferring policies trained in the source domain to the target domain becomes an attractive paradigm. Previous research has typically focused on domains with similar state and action spaces but differing in other aspects. In this paper, our primary focus lies in domains with different state and action spaces, which has broader practical implications, i.e. transfer the policy from robot A to robot B. Unlike prior methods that rely on paired data, we propose a novel approach for learning the mapping functions between state and action spaces across domains using unpaired data. We propose effect cycle consistency, which aligns the effects of transitions across two domains through a symmetrical optimization structure for learning these mapping functions. Once the mapping functions are learned, we can seamlessly transfer the policy from the source domain to the target domain. Our approach has been tested on three locomotion tasks and two robotic manipulation tasks. The empirical results demonstrate that our method can reduce alignment errors significantly and achieve better performance compared to the state-of-the-art method.

ROMar 24, 2025
Efficient Continual Adaptation of Pretrained Robotic Policy with Online Meta-Learned Adapters

Ruiqi Zhu, Endong Sun, Guanhe Huang et al.

Continual adaptation is essential for general autonomous agents. For example, a household robot pretrained with a repertoire of skills must still adapt to unseen tasks specific to each household. Motivated by this, building upon parameter-efficient fine-tuning in language models, prior works have explored lightweight adapters to adapt pretrained policies, which can preserve learned features from the pretraining phase and demonstrate good adaptation performances. However, these approaches treat task learning separately, limiting knowledge transfer between tasks. In this paper, we propose Online Meta-Learned adapters (OMLA). Instead of applying adapters directly, OMLA can facilitate knowledge transfer from previously learned tasks to current learning tasks through a novel meta-learning objective. Extensive experiments in both simulated and real-world environments demonstrate that OMLA can lead to better adaptation performances compared to the baseline methods. The project link: https://ricky-zhu.github.io/OMLA/.

LGOct 16, 2025
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization

Mengtao Lv, Ruiqi Zhu, Xinyu Wang et al.

Large language models (LLMs) have demonstrated remarkable capabilities in various kinds of tasks, while the billion or even trillion parameters bring storage and efficiency bottlenecks for inference. Quantization, particularly floating-point quantization, is known to be capable of speeding up LLM inference by reducing memory footprint and data movement during the inference process. For the first time, we advance the floating-point quantization exploration from integer bitwidths to non-integer bit-widths, namely AMS-Quant, to further approach the quantization sweet spot. AMS-Quant incorporates two novel techniques to put it into effect: (1) it proposes Mantissa-bit Sharing, which groups k quantized weights and lets them share the least significant mantissa bit, allowing us to further approach the minimum quantization bit-width without accuracy loss. (2) It introduces Adaptive Searching, which employs an offline optimization strategy to minimize the accuracy degradation introduced by sharing. Moreover, AMS-Quant is also prototyped as efficient CUDA Linear kernels, which translates memory savings into wall-clock latency reduction by reducing memory access. Extensive experiments on large-scale datasets and models show that AMS-Quant can quantize the model to FP-5.33-e2m3 and FP4.25-e2m2, and significantly speed up the LLM decoding over FP16 inference (2.8x and 3.2x), with negligible accuracy loss.

IVOct 28, 2024
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering

Zhilin Zhang, Jie Wang, Zhanghao Qin et al.

Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. While recent approaches rely heavily on unified large pre-trained Visual-Language Models, research on more efficient fusion mechanisms remains relatively limited in this domain. In this paper, we introduce a fusion model, OMniBAN, that integrates Orthogonality loss, Multi-head attention, and a Bilinear Attention Network to achieve high computational efficiency as well as solid performance. We conduct comprehensive experiments and demonstrate how bilinear attention fusion can approximate the performance of larger fusion models like cross-modal Transformer. Our results show that OMniBAN requires fewer parameters (approximately 2/3 of Transformer-based Co-Attention) and substantially lower FLOPs (approximately 1/4), while achieving comparable overall performance and even slight improvements on closed-ended questions on two key MedVQA benchmarks. This balance between efficiency and accuracy suggests that OMniBAN could be a viable option for real-world medical image question answering, where computational resources are often constrained.