Xinhua Wang

IR
h-index26
24papers
298citations
Novelty52%
AI Score56

24 Papers

99.9ROMay 14
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

Shichao Fan, Kun Wu, Zhengping Che et al.

Recent progress in large-scale robotic datasets and vision-language models (VLMs) has advanced research on vision-language-action (VLA) models. However, existing VLA models still face two fundamental challenges: (i) producing precise low-level actions from high-dimensional observations, (ii) bridging domain gaps across heterogeneous data sources, including diverse robot embodiments and human demonstrations. Existing methods often encode latent variables from either visual dynamics or robotic actions to guide policy learning, but they fail to fully exploit the complementary multi-modal knowledge present in large-scale, heterogeneous datasets. In this work, we present X Robotic Model 1 (XR-1), a novel framework for versatile and scalable VLA learning across diverse robots, tasks, and environments. XR-1 introduces the \emph{Unified Vision-Motion Codes (UVMC)}, a discrete latent representation learned via a dual-branch VQ-VAE that jointly encodes visual dynamics and robotic motion. UVMC addresses these challenges by (i) serving as an intermediate representation between the observations and actions, and (ii) aligning multimodal dynamic information from heterogeneous data sources to capture complementary knowledge. To effectively exploit UVMC, we propose a three-stage training paradigm: (i) self-supervised UVMC learning, (ii) UVMC-guided pretraining on large-scale cross-embodiment robotic datasets, and (iii) task-specific post-training. We validate XR-1 through extensive real-world experiments with more than 14,000 rollouts on six different robot embodiments, spanning over 120 diverse manipulation tasks. XR-1 consistently outperforms state-of-the-art baselines such as $π_{0.5}$, $π_0$, RDT, UniVLA, and GR00T-N1.5 while demonstrating strong generalization to novel objects, background variations, distractors, and illumination changes. Our project is at https://xr-1-vla.github.io/.

IRJun 16, 2022
Reinforcement Learning-enhanced Shared-account Cross-domain Sequential Recommendation

Lei Guo, Jinyu Zhang, Tong Chen et al.

Shared-account Cross-domain Sequential Recommendation (SCSR) is an emerging yet challenging task that simultaneously considers the shared-account and cross-domain characteristics in the sequential recommendation. Existing works on SCSR are mainly based on Recurrent Neural Network (RNN) and Graph Neural Network (GNN) but they ignore the fact that although multiple users share a single account, it is mainly occupied by one user at a time. This observation motivates us to learn a more accurate user-specific account representation by attentively focusing on its recent behaviors. Furthermore, though existing works endow lower weights to irrelevant interactions, they may still dilute the domain information and impede the cross-domain recommendation. To address the above issues, we propose a reinforcement learning-based solution, namely RL-ISN, which consists of a basic cross-domain recommender and a reinforcement learning-based domain filter. Specifically, to model the account representation in the shared-account scenario, the basic recommender first clusters users' mixed behaviors as latent users, and then leverages an attention model over them to conduct user identification. To reduce the impact of irrelevant domain information, we formulate the domain filter as a hierarchical reinforcement learning task, where a high-level task is utilized to decide whether to revise the whole transferred sequence or not, and if it does, a low-level task is further performed to determine whether to remove each interaction within it or not. To evaluate the performance of our solution, we conduct extensive experiments on two real-world datasets, and the experimental results demonstrate the superiority of our RL-ISN method compared with the state-of-the-art recommendation methods.

SYMar 22, 2011
Design and frequency analysis of continuous finite-time-convergent differentiator

Xinhua Wang, Hai Lin

In this paper, a continuous finite-time-convergent differentiator is presented based on a strong Lyapunov function. The continuous differentiator can reduce chattering phenomenon sufficiently than normal sliding mode differentiator, and the outputs of signal tracking and derivative estimation are all smooth. Frequency analysis is applied to compare the continuous differentiator with sliding mode differentiator. The beauties of the continuous finite-time-convergent differentiator include its simplicity, restraining noises sufficiently, and avoiding the chattering phenomenon.

SYMay 6, 2015
Rapid-convergent nonlinear differentiator

Xinhua Wang, Bijan Shirinzadeh

A nonlinear differentiator being fit for rapid convergence is presented, which is based on singular perturbation technique. The differentiator design can not only sufficiently reduce the chattering phenomenon of derivative estimation by introducing a continuous power function, but the dynamical performances are also improved by adding linear correction terms to the nonlinear ones. Moreover, strong robustness ability is obtained by integrating nonlinear items and the linear filter. The merits of the rapid-convergent differentiator include the excellent dynamical performances, restraining noises sufficiently, avoiding the chattering phenomenon and being not based on system model. The theoretical results are confirmed by computer simulations and an experiment.

SYMar 22, 2011
Design and analysis of continuous hybrid differentiator

Xinhua Wang, Hai Lin

In this paper, a continuous hybrid differentiator is presented based on a strong Lyapunov function. The differentiator design can not only reduce sufficiently chattering phenomenon of derivative estimation by introducing a perturbation parameter, but also the dynamical performances are improved by adding linear correction terms to the nonlinear ones. Moreover, strong robustness ability is obtained by integrating sliding mode items and the linear filter. Frequency analysis is applied to compare the hybrid continuous differentiator with sliding mode differentiator. The merits of the continuous hybrid differentiator include the excellent dynamical performances, restraining noises sufficiently, and avoiding the chattering phenomenon.

99.9ROApr 9
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

Shuanghao Bai, Meng Li, Xinyuan Lv et al.

Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.

SYMay 4, 2017
Aircraft navigation based on differentiation-integration observer

Xinhua Wang, Lilong Cai

In this paper, a generalized differentiation-integration observer is presented based on sensors selection. The proposed differentiation-integration observer can estimate the multiple integrals and high-order derivatives of a signal, synchronously. The parameters selection rules are presented for the differentiation-integration observer. The theoretical results are confirmed by the frequency-domain analysis. The effectiveness of the proposed observer are verified through the numerical simulations on a quadrotor aircraft: i) through the differentiation-integration observer, the attitude angle and the uncertainties in attitude dynamics are estimated synchronously from the measurements of angular velocity; ii) a control law is designed based on the observers to drive the aircraft to track a reference trajectory.

IRFeb 7, 2023
Towards Lightweight Cross-domain Sequential Recommendation via External Attention-enhanced Graph Convolution Network

Jinyu Zhang, Huichuan Duan, Lei Guo et al.

Cross-domain Sequential Recommendation (CSR) is an emerging yet challenging task that depicts the evolution of behavior patterns for overlapped users by modeling their interactions from multiple domains. Existing studies on CSR mainly focus on using composite or in-depth structures that achieve significant improvement in accuracy but bring a huge burden to the model training. Moreover, to learn the user-specific sequence representations, existing works usually adopt the global relevance weighting strategy (e.g., self-attention mechanism), which has quadratic computational complexity. In this work, we introduce a lightweight external attention-enhanced GCN-based framework to solve the above challenges, namely LEA-GCN. Specifically, by only keeping the neighborhood aggregation component and using the Single-Layer Aggregating Protocol (SLAP), our lightweight GCN encoder performs more efficiently to capture the collaborative filtering signals of the items from both domains. To further alleviate the framework structure and aggregate the user-specific sequential pattern, we devise a novel dual-channel External Attention (EA) component, which calculates the correlation among all items via a lightweight linear structure. Extensive experiments are conducted on two real-world datasets, demonstrating that LEA-GCN requires a smaller volume and less training time without affecting the accuracy compared with several state-of-the-art methods.

ROFeb 18
RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Yixue Zhang, Kun Wu, Zhi Gao et al.

The pursuit of general-purpose robotic manipulation is hindered by the scarcity of diverse, real-world interaction data. Unlike data collection from web in vision or language, robotic data collection is an active process incurring prohibitive physical costs. Consequently, automated task curation to maximize data value remains a critical yet under-explored challenge. Existing manual methods are unscalable and biased toward common tasks, while off-the-shelf foundation models often hallucinate physically infeasible instructions. To address this, we introduce RoboGene, an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks across single-arm, dual-arm, and mobile robots. RoboGene integrates three core components: diversity-driven sampling for broad task coverage, self-reflection mechanisms to enforce physical constraints, and human-in-the-loop refinement for continuous improvement. We conduct extensive quantitative analysis and large-scale real-world experiments, collecting datasets of 18k trajectories and introducing novel metrics to assess task quality, feasibility, and diversity. Results demonstrate that RoboGene significantly outperforms state-of-the-art foundation models (e.g., GPT-4o, Gemini 2.5 Pro). Furthermore, real-world experiments show that VLA models pre-trained with RoboGene achieve higher success rates and superior generalization, underscoring the importance of high-quality task generation. Our project is available at https://robogene-boost-vla.github.io.

SYFeb 13, 2011
High-order integral-chain differentiator and application to acceleration feedback

Xinhua Wang

The equivalence between integral-chain differentiator and usual high-gain differentiator is given under suitable coordinate transformation. Integral-chain differentiator can restrain noises more thoroughly than usual high-gain linear differentiator. In integral-chain differentiator, disturbances only exist in the last differential equation and can be restrained through each layer of integrator. Moreover, a nonlinear integral-chain differentiator is designed which is the expansion of linear integral-chain differentiator. Finally, a 3-order differentiator is applied to the estimation of acceleration for a second-order uncertain system.

SYFeb 14, 2011
Universal approximation using differentiators and application to feedback control

Xinhua Wang

In this paper, we consider the problems of approximating uncertainties and feedback control for a class of nonlinear systems without full-known states, and two approximation methods are proposed: universal approximation using integral-chain differentiator or extended observer. Comparing to the approximations by fuzzy system and radial-based-function (RBF) neural networks, the presented two methods can not only approximate universally the uncertainties, but also estimate the unknown states. Moreover, the integral-chain differentiator can restrain noises thoroughly. The theoretical results are confirmed by computer simulations for feedback control.

SYMar 5, 2011
Two-step differentiator for delayed signal

Xinhua Wang, Hai Lin

This paper presents a high-order differentiator for delayed measurement signal. The proposed differentiator not only can correct the delay in signal, but aslo can estimate the undelayed derivatives. The differentiator consists of two-step algorithms with the delayed time instant. Conditions are given ensuring convergence of the estimation error for the given delay in the signals. The merits of method include its simple implementation and interesting application. Numerical simulations illustrate the effectiveness of the proposed differentiator.

SYFeb 13, 2011
Frequency characteristics based on describing function method for differentiators

Xinhua Wang

In this paper, describing function method is used to analyze the characteristics and parameters selection of differentiators. Nonlinear differentiator is an effective compensation to linear differentiator, and hybrid differentiator consisting of linear and nonlinear parts is the combination of both advantages of linear and nonlinear differentiators. The merits of the hybrid differentiator include its simplicity, rapid convergence at all times, and restraining noises effectively. The methods are confirmed by some examples.

CROct 11, 2025Code
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Guozhi Liu, Qi Mu, Tiansheng Huang et al.

Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60\% and 3.30\%, respectively, and enhances inference performance by 3.50\% and 1.10\%. Notably, it reduces training time by 56.83\% and 57.63\%, respectively. Our code is available at https://github.com/Lslland/Pharmacist.

28.4IRApr 20
FedCRF: A Federated Cross-domain Recommendation Method with Semantic-driven Deep Knowledge Fusion

Lei Guo, Ting Yang, Hui Liu et al.

As user behavior data becomes increasingly scattered across different platforms, achieving cross-domain knowledge fusion while preserving privacy has become a critical issue in recommender systems. Existing PPCDR methods usually rely on overlapping users or items as a bridge, making them inapplicable to non-overlapping scenarios. They also suffer from limitations in the collaborative modeling of global and local semantics. To this end, this paper proposes a Federated Cross-domain Recommendation method with deep knowledge Fusion (FedCRF). Using textual semantics as a cross-domain bridge, FedCRF achieves cross-domain knowledge transfer via federated semantic learning under the non-overlapping scenario. Specifically, FedCRF constructs global semantic clusters on the server side to extract shared semantic information, and designs a FGSAT module on the client side to dynamically adapt to local data distributions and alleviate cross-domain distribution shift. Meanwhile, it builds a semantic graph based on textual features to learn representations that integrate both structural and semantic information, and introduces contrastive learning constraints between global and local semantic representations to enhance semantic consistency and promote deep knowledge fusion. In this framework, only item semantic representations are shared, while user interaction data remains locally stored, effectively mitigating privacy leakage risks. Experimental results on multiple real-world datasets show that FedCRF significantly outperforms existing methods in terms of Recall@20 and NDCG@20, validating its effectiveness and superiority in non-overlapping cross-domain recommendation scenarios.

LGJul 22, 2021Code
Tri-Branch Convolutional Neural Networks for Top-$k$ Focused Academic Performance Prediction

Chaoran Cui, Jian Zong, Yuling Ma et al.

Academic performance prediction aims to leverage student-related information to predict their future academic outcomes, which is beneficial to numerous educational applications, such as personalized teaching and academic early warning. In this paper, we address the problem by analyzing students' daily behavior trajectories, which can be comprehensively tracked with campus smartcard records. Different from previous studies, we propose a novel Tri-Branch CNN architecture, which is equipped with row-wise, column-wise, and depth-wise convolution and attention operations, to capture the characteristics of persistence, regularity, and temporal distribution of student behavior in an end-to-end manner, respectively. Also, we cast academic performance prediction as a top-$k$ ranking problem, and introduce a top-$k$ focused loss to ensure the accuracy of identifying academically at-risk students. Extensive experiments were carried out on a large-scale real-world dataset, and we show that our approach substantially outperforms recently proposed methods for academic performance prediction. For the sake of reproducibility, our codes have been released at https://github.com/ZongJ1111/Academic-Performance-Prediction.

RODec 18, 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Kun Wu, Chengkai Hou, Jiaming Liu et al.

In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the UR5e, the AgileX dual-arm robot, and a humanoid robot with dual dexterous hands. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and state-of-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is at https://x-humanoid-robomind.github.io/.

CVOct 29, 2025
Larger Hausdorff Dimension in Scanning Pattern Facilitates Mamba-Based Methods in Low-Light Image Enhancement

Xinhua Wang, Caibo Feng, Xiangjun Fu et al.

We propose an innovative enhancement to the Mamba framework by increasing the Hausdorff dimension of its scanning pattern through a novel Hilbert Selective Scan mechanism. This mechanism explores the feature space more effectively, capturing intricate fine-scale details and improving overall coverage. As a result, it mitigates information inconsistencies while refining spatial locality to better capture subtle local interactions without sacrificing the model's ability to handle long-range dependencies. Extensive experiments on publicly available benchmarks demonstrate that our approach significantly improves both the quantitative metrics and qualitative visual fidelity of existing Mamba-based low-light image enhancement methods, all while reducing computational resource consumption and shortening inference time. We believe that this refined strategy not only advances the state-of-the-art in low-light image enhancement but also holds promise for broader applications in fields that leverage Mamba-based techniques.

IRApr 1, 2020
Task-adaptive Asymmetric Deep Cross-modal Hashing

Fengling Li, Tong Wang, Lei Zhu et al.

Supervised cross-modal hashing aims to embed the semantic correlations of heterogeneous modality data into the binary hash codes with discriminative semantic labels. Because of its advantages on retrieval and storage efficiency, it is widely used for solving efficient cross-modal retrieval. However, existing researches equally handle the different tasks of cross-modal retrieval, and simply learn the same couple of hash functions in a symmetric way for them. Under such circumstance, the uniqueness of different cross-modal retrieval tasks are ignored and sub-optimal performance may be brought. Motivated by this, we present a Task-adaptive Asymmetric Deep Cross-modal Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning. Unlike previous cross-modal hashing approaches, our learning framework jointly optimizes semantic preserving that transforms deep features of multimedia data into binary hash codes, and the semantic regression which directly regresses query modality representation to explicit label. With our model, the binary codes can effectively preserve semantic correlations across different modalities, meanwhile, adaptively capture the query semantics. The superiority of TA-ADCMH is proved on two standard datasets from many aspects.

IROct 23, 2018
Topic representation: finding more representative words in topic models

Jinjin Chi, Jihong Ouyang, Changchun Li et al.

The top word list, i.e., the top-M words with highest marginal probability in a given topic, is the standard topic representation in topic models. Most of recent automatical topic labeling algorithms and popular topic quality metrics are based on it. However, we find, empirically, words in this type of top word list are not always representative. The objective of this paper is to find more representative top word lists for topics. To achieve this, we rerank the words in a given topic by further considering marginal probability on words over every other topic. The reranking list of top-M words is used to be a novel topic representation for topic models. We investigate three reranking methodologies, using (1) standard deviation weight, (2) standard deviation weight with topic size and (3) Chi Square \c{hi}2statistic selection. Experimental results on real world collections indicate that our representations can extract more representative words for topics, agreeing with human judgements.

SYJul 22, 2015
Modeling and control of an agile tail-sitter aircraft

Xinhua Wang, Zengqiang Chen, Zhuzhi Yuan

This paper presents a model of an agile tail-sitter aircraft, which can operate as a helicopter as well as capable of transition to fixed-wing flight. Aerodynamics of the co-axial counter-rotating propellers with quad rotors are analysed under the condition that the co-axial is operated at equal rotor torque (power). A finite-time convergent observer based on Lyapunov function is presented to estimate the unknown nonlinear terms in co-axial counter-rotating propellers, the uncertainties and external disturbances during mode transition. Furthermore, a simple controller based on the finite-time convergent observer and quaternion method is designed to implement mode transition.

SYJun 6, 2015
Frequency-domain analysis of nonlinear and linear integrators

Xinhua Wang

In this paper, frequency-domain analysis based on frequency sweep method is presented for a nonlinear double integrator and a new linear integrator. All the two types of integrators can estimate the onefold and double integrals of a signal synchronously. With respect to the linear double integrator, the nonlinear integrator has better estimation performance and stronger robustness. Importantly, the integrator parameters can be regulated from the frequency-domain analysis.

SYJun 2, 2015
Mathematical modeling and control of a tilt-rotor aircraft

Xinhua Wang, Lilong Cai

This paper presents a novel model of large-size tilt-rotor aircraft, which can operate as a helicopter as well as being capable of transition to fixed-wing flight. Aerodynamics of the dynamic large-size tilt-rotors based on blade element method is analyzed during mode transition. For the large-size aircraft with turboshaft engines, the blade pitch angles of the rotors are regulated to vary according to the desired level of thrust, and the following expressions are formulated explicitly: rotor thrust and blade pitch angle, drag torque and blade pitch angle. A finite-time convergent observer based on Lyapunov function is developed to reconstruct the unknown variables and uncertainties during mode transitions. The merits of this design include the modeling of dynamic large-size tilt-rotor, ease of the uncertainties estimation during the tilting and the widely applications. Moreover, a switched logic controller based on the finite-time convergent observer is proposed to drive the aircraft to implement the mode transition with invariant flying height.

SYMay 14, 2015
Nonlinear continuous integral-derivative observer

Xinhua Wang, Bijan Shirinzadeh

In this paper, a high-order nonlinear continuous integral-derivative observer is presented based on finite-time stability and singular perturbation technique. The proposed integral-derivative observer can not only obtain the multiple integrals of a signal, but can also estimate the derivatives. Conditions are given ensuring finite-time stability for the presented integral-derivative observer, and the stability and robustness in time domain are analysed. The merits of the presented integral-derivative observer include its synchronous estimation of integrals and derivatives, finite-time stability, ease of parameters selection, sufficient stochastic noises rejection and almost no drift phenomenon. The theoretical results are confirmed by computational analysis and simulations.