Zhen Meng

RO
h-index13
10papers
166citations
Novelty51%
AI Score40

10 Papers

ROJul 31, 2022
Sampling, Communication, and Prediction Co-Design for Synchronizing the Real-World Device and Digital Model in Metaverse

Zhen Meng, Changyang She, Guodong Zhao et al.

The metaverse has the potential to revolutionize the next generation of the Internet by supporting highly interactive services with the help of Mixed Reality (MR) technologies; still, to provide a satisfactory experience for users, the synchronization between the physical world and its digital models is crucial. This work proposes a sampling, communication and prediction co-design framework to minimize the communication load subject to a constraint on tracking the Mean Squared Error (MSE) between a real-world device and its digital model in the metaverse. To optimize the sampling rate and the prediction horizon, we exploit expert knowledge and develop a constrained Deep Reinforcement Learning (DRL) algorithm, named Knowledge-assisted Constrained Twin-Delayed Deep Deterministic (KC-TD3) policy gradient algorithm. We validate our framework on a prototype composed of a real-world robotic arm and its digital model. Compared with existing approaches: (1) When the tracking error constraint is stringent (MSE=0.002 degrees), our policy degenerates into the policy in the sampling-communication co-design framework. (2) When the tracking error constraint is mild (MSE=0.007 degrees), our policy degenerates into the policy in the prediction-communication co-design framework. (3) Our framework achieves a better trade-off between the average MSE and the average communication load compared with a communication system without sampling and prediction. For example, the average communication load can be reduced up to 87% when the track error constraint is 0.002 degrees. (4) Our policy outperforms the benchmark with the static sampling rate and prediction horizon optimized by exhaustive search, in terms of the tail probability of the tracking error. Furthermore, with the assistance of expert knowledge, the proposed algorithm KC-TD3 achieves better convergence time, stability, and final policy performance.

IRJul 17, 2024
GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Guojiao Lin, Zhen Meng, Dongjie Wang et al.

Multimodal recommendation systems (MMRS) have received considerable attention from the research community due to their ability to jointly utilize information from user behavior and product images and text. Previous research has two main issues. First, many long-tail items in recommendation systems have limited interaction data, making it difficult to learn comprehensive and informative representations. However, past MMRS studies have overlooked this issue. Secondly, users' modality preferences are crucial to their behavior. However, previous research has primarily focused on learning item modality representations, while user modality representations have remained relatively simplistic.To address these challenges, we propose a novel Graphs and User Modalities Enhancement (GUME) for long-tail multimodal recommendation. Specifically, we first enhance the user-item graph using multimodal similarity between items. This improves the connectivity of long-tail items and helps them learn high-quality representations through graph propagation. Then, we construct two types of user modalities: explicit interaction features and extended interest features. By using the user modality enhancement strategy to maximize mutual information between these two features, we improve the generalization ability of user modality representations. Additionally, we design an alignment strategy for modality data to remove noise from both internal and external perspectives. Extensive experiments on four publicly available datasets demonstrate the effectiveness of our approach.

CLAug 2, 2024
BioRAG: A RAG-LLM Framework for Biological Question Reasoning

Chengrui Wang, Qingqing Long, Meng Xiao et al.

The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Large Language Models (LLMs) framework. Our approach starts with parsing, indexing, and segmenting an extensive collection of 22 million scientific papers as the basic knowledge, followed by training a specialized embedding model tailored to this domain. Additionally, we enhance the vector retrieval process by incorporating a domain-specific knowledge hierarchy, which aids in modeling the intricate interrelationships among each query and context. For queries requiring the most current information, BioRAG deconstructs the question and employs an iterative retrieval process incorporated with the search engine for step-by-step reasoning. Rigorous experiments have demonstrated that our model outperforms fine-tuned LLM, LLM with search engines, and other scientific RAG frameworks across multiple life science question-answering tasks.

CVJul 23, 2024
Timeliness-Fidelity Tradeoff in 3D Scene Representations

Xiangmin Xu, Zhen Meng, Yichi Zhang et al.

Real-time three-dimensional (3D) scene representations serve as one of the building blocks that bolster various innovative applications, e.g., digital manufacturing, Virtual/Augmented/Extended/Mixed Reality (VR/AR/XR/MR), and the metaverse. Despite substantial efforts that have been made to real-time communications and computing, real-time 3D scene representations remain a challenging task. This paper investigates the tradeoff between timeliness and fidelity in real-time 3D scene representations. Specifically, we establish a framework to evaluate the impact of communication delay on the tradeoff, where the real-world scenario is monitored by multiple cameras that communicate with an edge server. To improve fidelity for 3D scene representations, we propose to use a single-step Proximal Policy Optimization (PPO) method that leverages the Age of Information (AoI) to decide if the received image needs to be involved in 3D scene representations and rendering. We test our framework and the proposed approach with different well-known 3D scene representation methods. Simulation results reveal that real-time 3D scene representation can be sensitively affected by communication delay, and our proposed method can achieve optimal 3D scene representation results.

CVSep 21, 2025
Task-Oriented Communications for 3D Scene Representation: Balancing Timeliness and Fidelity

Xiangmin Xu, Zhen Meng, Kan Chen et al.

Real-time Three-dimensional (3D) scene representation is a foundational element that supports a broad spectrum of cutting-edge applications, including digital manufacturing, Virtual, Augmented, and Mixed Reality (VR/AR/MR), and the emerging metaverse. Despite advancements in real-time communication and computing, achieving a balance between timeliness and fidelity in 3D scene representation remains a challenge. This work investigates a wireless network where multiple homogeneous mobile robots, equipped with cameras, capture an environment and transmit images to an edge server over channels for 3D representation. We propose a contextual-bandit Proximal Policy Optimization (PPO) framework incorporating both Age of Information (AoI) and semantic information to optimize image selection for representation, balancing data freshness and representation quality. Two policies -- the $ω$-threshold and $ω$-wait policies -- together with two benchmark methods are evaluated, timeliness embedding and weighted sum, on standard datasets and baseline 3D scene representation models. Experimental results demonstrate improved representation fidelity while maintaining low latency, offering insight into the model's decision-making process. This work advances real-time 3D scene representation by optimizing the trade-off between timeliness and fidelity in dynamic environments.

ROAug 28, 2025
Task-Oriented Edge-Assisted Cross-System Design for Real-Time Human-Robot Interaction in Industrial Metaverse

Kan Chen, Zhen Meng, Xiangmin Xu et al.

Real-time human-device interaction in industrial Metaverse faces challenges such as high computational load, limited bandwidth, and strict latency. This paper proposes a task-oriented edge-assisted cross-system framework using digital twins (DTs) to enable responsive interactions. By predicting operator motions, the system supports: 1) proactive Metaverse rendering for visual feedback, and 2) preemptive control of remote devices. The DTs are decoupled into two virtual functions-visual display and robotic control-optimizing both performance and adaptability. To enhance generalizability, we introduce the Human-In-The-Loop Model-Agnostic Meta-Learning (HITL-MAML) algorithm, which dynamically adjusts prediction horizons. Evaluation on two tasks demonstrates the framework's effectiveness: in a Trajectory-Based Drawing Control task, it reduces weighted RMSE from 0.0712 m to 0.0101 m; in a real-time 3D scene representation task for nuclear decommissioning, it achieves a PSNR of 22.11, SSIM of 0.8729, and LPIPS of 0.1298. These results show the framework's capability to ensure spatial precision and visual fidelity in real-time, high-risk industrial environments.

AIApr 24, 2025
Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning

Weiliang Zhang, Xiaohan Huang, Yi Du et al.

Feature selection aims to preprocess the target dataset, find an optimal and most streamlined feature subset, and enhance the downstream machine learning task. Among filter, wrapper, and embedded-based approaches, the reinforcement learning (RL)-based subspace exploration strategy provides a novel objective optimization-directed perspective and promising performance. Nevertheless, even with improved performance, current reinforcement learning approaches face challenges similar to conventional methods when dealing with complex datasets. These challenges stem from the inefficient paradigm of using one agent per feature and the inherent complexities present in the datasets. This observation motivates us to investigate and address the above issue and propose a novel approach, namely HRLFS. Our methodology initially employs a Large Language Model (LLM)-based hybrid state extractor to capture each feature's mathematical and semantic characteristics. Based on this information, features are clustered, facilitating the construction of hierarchical agents for each cluster and sub-cluster. Extensive experiments demonstrate the efficiency, scalability, and robustness of our approach. Compared to contemporary or the one-feature-one-agent RL-based approaches, HRLFS improves the downstream ML performance with iterative feature subspace exploration while accelerating total run time by reducing the number of agents involved.

ROApr 2, 2025
Preference-Driven Active 3D Scene Representation for Robotic Inspection in Nuclear Decommissioning

Zhen Meng, Kan Chen, Xiangmin Xu et al.

Active 3D scene representation is pivotal in modern robotics applications, including remote inspection, manipulation, and telepresence. Traditional methods primarily optimize geometric fidelity or rendering accuracy, but often overlook operator-specific objectives, such as safety-critical coverage or task-driven viewpoints. This limitation leads to suboptimal viewpoint selection, particularly in constrained environments such as nuclear decommissioning. To bridge this gap, we introduce a novel framework that integrates expert operator preferences into the active 3D scene representation pipeline. Specifically, we employ Reinforcement Learning from Human Feedback (RLHF) to guide robotic path planning, reshaping the reward function based on expert input. To capture operator-specific priorities, we conduct interactive choice experiments that evaluate user preferences in 3D scene representation. We validate our framework using a UR3e robotic arm for reactor tile inspection in a nuclear decommissioning scenario. Compared to baseline methods, our approach enhances scene representation while optimizing trajectory efficiency. The RLHF-based policy consistently outperforms random selection, prioritizing task-critical details. By unifying explicit 3D geometric modeling with implicit human-in-the-loop optimization, this work establishes a foundation for adaptive, safety-critical robotic perception systems, paving the way for enhanced automation in nuclear decommissioning, remote maintenance, and other high-risk environments.

AIJun 11, 2024
Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization

Weiliang Zhang, Zhen Meng, Dongjie Wang et al.

Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. This combination mitigates the biases inherent in the initial boundaries and harnesses RL's adaptability to refine and target gene panel selection dynamically. To illustrate the effectiveness of our method, we conducted detailed comparative experiments, case studies, and visualization analysis.

CLSep 3, 2019
A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen

Zhuosheng Zhang, Zhen Meng, Hai Zhao

This paper presents a smart sliding Chinese pinyin Input Method Editor (IME) for touchscreen devices which allows user finger sliding from one key to another on the touchscreen instead of tapping keys one by one, while the target Chinese character sequence will be predicted during the sliding process to help user input Chinese characters efficiently. Moreover, the layout of the virtual keyboard of our IME adapts to user sliding for more efficient inputting. The layout adaption process is utilized with Recurrent Neural Networks (RNN) and deep reinforcement learning. The pinyin-to-character converter is implemented with a sequence-to-sequence (Seq2Seq) model to predict the target Chinese sequence. A sliding simulator is built to automatically produce sliding samples for model training and virtual keyboard test. The key advantage of our proposed IME is that nearly all its built-in tactics can be optimized automatically with deep learning algorithms only following user behavior. Empirical studies verify the effectiveness of the proposed model and show a better user input efficiency.