56.8ROMar 19
Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision--Language--Motion Diffusion ArchitectureFuze Sun, Lingyu Li, Lekan Dai et al.
This article suggests a reasoning-guided vision-language-motion diffusion framework (RG-VLMD) for generating instruction-aware co-speech gestures for humanoid robots in educational scenarios. The system integrates multi-modal affective estimation, pedagogical reasoning, and teaching-act-conditioned motion synthesis to enable adaptive and semantically consistent robot behavior. A gated mixture-of-experts model predicts Valence/Arousal from input text, visual, and acoustic features, which then mapped to discrete teaching-act categories through an affect-driven policy.These signals condition a diffusion-based motion generator using clip-level intent and frame-level instructional schedules via additive latent restriction with auxiliary action-group supervision. Compared to a baseline diffusion model, our proposed method produces more structured and distinctive motion patterns, as verified by motion statics and pairwise distance analysis. Generated motion sequences remain physically plausible and can be retargeted to a NAO robot for real-time execution. The results reveal that reasoning-guided instructional conditioning improves gesture controllability and pedagogical expressiveness in educational human-robot interaction.
AISep 22, 2025
Towards General Computer Control with Hierarchical Agents and Multi-Level Action SpacesZihan Dong, Xinyu Fan, Zixiang Tang et al.
Controlling desktop applications via software remains a fundamental yet under-served problem. Existing multi-modal large language models (MLLMs) ingest screenshots and task instructions to generate keystrokes and mouse events, but they suffer from prohibitive inference latency, poor sample efficiency on long-horizon sparse-reward tasks, and infeasible on-device deployment. We introduce a lightweight hierarchical reinforcement learning framework, ComputerAgent, that formulates OS control as a two-level option process (manager and subpolicy), employs a triple-modal state encoder (screenshot, task ID, numeric state) to handle visual and contextual diversity, integrates meta-actions with an early-stop mechanism to reduce wasted interactions, and uses a compact vision backbone plus small policy networks for on-device inference (15M parameters). On a suite of 135 real-world desktop tasks, ComputerAgent attains 92.1% success on simple tasks (<8 steps) and 58.8% on hard tasks (>=8 steps), matching or exceeding 200B-parameter MLLM baselines on simple scenarios while reducing model size by over four orders of magnitude and halving inference time. These results demonstrate that hierarchical RL offers a practical, scalable alternative to monolithic MLLM-based automation for computer control.
LGJun 27, 2025
Physics-informed network paradigm with data generation and background noise removal for diverse distributed acoustic sensing applicationsYangyang Wan, Haotian Wang, Xuhui Yu et al.
Distributed acoustic sensing (DAS) has attracted considerable attention across various fields and artificial intelligence (AI) technology plays an important role in DAS applications to realize event recognition and denoising. Existing AI models require real-world data (RWD), whether labeled or not, for training, which is contradictory to the fact of limited available event data in real-world scenarios. Here, a physics-informed DAS neural network paradigm is proposed, which does not need real-world events data for training. By physically modeling target events and the constraints of real world and DAS system, physical functions are derived to train a generative network for generation of DAS events data. DAS debackground net is trained by using the generated DAS events data to eliminate background noise in DAS data. The effectiveness of the proposed paradigm is verified in event identification application based on a public dataset of DAS spatiotemporal data and in belt conveyor fault monitoring application based on DAS time-frequency data, and achieved comparable or better performance than data-driven networks trained with RWD. Owing to the introduction of physical information and capability of background noise removal, the paradigm demonstrates generalization in same application on different sites. A fault diagnosis accuracy of 91.8% is achieved in belt conveyor field with networks which transferred from simulation test site without any fault events data of test site and field for training. The proposed paradigm is a prospective solution to address significant obstacles of data acquisition and intense noise in practical DAS applications and explore more potential fields for DAS.
CVApr 17, 2025
Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global TransformsJingjing Liu, Zhiyong Wang, Xinyu Fan et al.
Existing 3D human pose estimation methods often suffer in performance, when applied to cross-scenario inference, due to domain shifts in characteristics such as camera viewpoint, position, posture, and body size. Among these factors, camera viewpoints and locations have been shown to contribute significantly to the domain gap by influencing the global positions of human poses. To address this, we propose a novel framework that explicitly conducts global transformations between pose positions in the camera coordinate systems of source and target domains. We start with a Pseudo-Label Generation Module that is applied to the 2D poses of the target dataset to generate pseudo-3D poses. Then, a Global Transformation Module leverages a human-centered coordinate system as a novel bridging mechanism to seamlessly align the positional orientations of poses across disparate domains, ensuring consistent spatial referencing. To further enhance generalization, a Pose Augmentor is incorporated to address variations in human posture and body size. This process is iterative, allowing refined pseudo-labels to progressively improve guidance for domain adaptation. Our method is evaluated on various cross-dataset benchmarks, including Human3.6M, MPI-INF-3DHP, and 3DPW. The proposed method outperforms state-of-the-art approaches and even outperforms the target-trained model.
CRJan 7, 2020
Provenance-based Classification Policy based on Encrypted SearchXinyu Fan, Faen Zhang, Jiahong Wu et al.
As an important type of cloud data, digital provenance is arousing increasing attention on improving system performance. Currently, provenance has been employed to provide cues regarding access control and to estimate data quality. However, provenance itself might also be sensitive information. Therefore, provenance might be encrypted and stored in the Cloud. In this paper, we provide a mechanism to classify cloud documents by searching specific keywords from their encrypted provenance, and we prove our scheme achieves semantic security. In term of application of the proposed techniques, considering that files are classified to store separately in the cloud, in order to facilitate the regulation and security protection for the files, the classification policies can use provenance as conditions to determine the category of a document. Such as the easiest sample policy goes like: the documents have been reviewed twice can be classified as "public accessible", which can be accessed by the public.
CRJan 7, 2020
A fine-grained policy model for Provenance-based Access Control and Policy Algebras.pdfXinyu Fan, Faen Zhang, Jianfei Song et al.
A fine-grained provenance-based access control policy model is proposed in this paper, in order to improve the express performance of existing model. This method employs provenance as conditions to determine whether a piece of data can be accessed because historical operations performed on data could reveal clues about its sensitivity and vulnerability. Particularly, our proposed work provides a four-valued decision set which allows showing status to match a restriction particularly. This framework consists of target policy, access control policy, and policy algebras. With the complete definition and algebra system construction, a practical fine-grained access control policy model is developed.
CRDec 1, 2019
On the Security of A Remote Cloud Storage Integrity Checking ProtocolFaen Zhang, Xinyu Fan, Pengcheng Zhou et al.
Data security and privacy is an important but challenging problem in cloud computing. One of the security concerns from cloud users is how to efficiently verify the integrity of their data stored on the cloud server. Third Party Auditing (TPA) is a new technique proposed in recent years to achieve this goal. In a recent paper (IEEE Transactions on Computers 62(2): 362-375 (2013)), Wang et al. proposed a highly efficient and scalable TPA protocol and also a Zero Knowledge Public Auditing protocol which can prevent offline guessing attacks. However, in this paper, we point out several security weaknesses in Wang et al's protocols: first, we show that an attacker can arbitrarily modify the cloud data without being detected by the auditor in the integrity checking process, and the attacker can achieve this goal even without knowing the content of the cloud data or any verification metadata maintained by the cloud server; secondly, we show that the Zero Knowledge Public Auditing protocol cannot achieve its design goal, that is to prevent offline guessing attacks.
CRDec 1, 2019
Zero knowledge proofs for cloud storage integrity checkingFaen Zhang, Xinyu Fan, Pengcheng Zhou et al.
With the wide application of cloud storage, cloud security has become a crucial concern. Related works have addressed security issues such as data confidentiality and integrity, which ensure that the remotely stored data are well maintained by the cloud. However, how to define zero-knowledge proof algorithms for stored data integrity check has not been formally defined and investigated. We believe that it is important that the cloud server is unable to reveal any useful information about the stored data. In this paper, we introduce a novel definition of data privacy for integrity checks, which describes very high security of a zero-knowledge proof. We found that all other existing remote integrity proofs do not capture this feature. We provide a comprehensive study of data privacy and an integrity check algorithm that captures data integrity, confidentiality, privacy, and soundness.
CRDec 1, 2019
Purpose-based access policy on provenance and data algebraFaen Zhang, Xinyu Fan, Wenfeng Zhou et al.
It is a crucial mechanism of access control to determine that data can only be accessed for allowed purposes. To achieve this mechanism, we propose purpose-based access policies in this paper. Different from provenance-based policies that determine if a piece of data can be accessed or not, purpose-based access policies determines for what purposes can data be accessed. Particularly, the purposes can be classified as different sensitivity levels. For the first time, We tailor policy algebras to include internal and external policy operators for hierarchical purposes, in order to merge purpose sets generated by individual policies. We also created external policy algebras to merge policies from multi-parties. With different types' testing experiments, our model is proved to be feasible and practical.
CRDec 1, 2019
PACLP: a fine-grained partition-based access control policy language for provenanceXinyu Fan, Faen Zhang, Jianfei Song et al.
Even though the idea of partitioning provenance graphs for access control was previously proposed, employing segments of the provenance DAG for fine-grained access control to provenance data has not been thoroughly explored. Hence, we take segments of a provenance graph, based on the extended OPM, and defined use a variant of regular expressions, and utilize them in our fine-grained access control language. It can not only return partial graphs to answer access requests but also introduce segments as restrictions in order to screen targeted data.
LGNov 13, 2019
Dynamic Connected Neural Decision Classifier and Regressor with Dynamic Softing PruningXinyu Fan
To deal with various datasets over different complexity, this paper presents an self-adaptive learning model that combines the proposed Dynamic Connected Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning (DSP). DNDN is a combination of random forests and deep neural networks that enjoys both the advantages of strong classification capability of tree-like structure and representation learning capability of network structure. Based on Deep Neural Decision Forests (DNDF), this paper adopts an end-to-end training approach by representing the classification distribution with multiple randomly initialized softmax layers, which further allows an ensemble of multiple random forests attached to layers of neural network with different depth. We also propose a soft pruning method DSP to reduce the redundant connections of the network adaptively to avoid over-fitting simple dataset. The model demonstrates no performance loss compared with unpruned models and even higher robustness over different data and feature distribution. Extensive experiments on different datasets demonstrate the superiority of the proposed model over other popular algorithms in solving classification tasks.
LGNov 13, 2019
Regression via Arbitrary Quantile ModelingFaen Zhang, Xinyu Fan, Hui Xu et al.
In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the prediction, which achieved better performance compared to traditional loss functions. More specifically, a new distribution regression method, Deep Distribution Regression (DDR), is proposed to estimate arbitrary quantiles of the response variable. Our DDR method consists of two models: a Q model, which predicts the corresponding value for arbitrary quantile, and an F model, which predicts the corresponding quantile for arbitrary value. Furthermore, the duality between Q and F models enables us to design a novel loss function for joint training and perform a dual inference mechanism. Our experiments demonstrate that our DDR-joint and DDR-disjoint methods outperform previous methods such as AdaBoost, random forest, LightGBM, and neural networks both in terms of mean and quantile prediction.
CVMay 5, 2019
Accurate Face Detection for High PerformanceFaen Zhang, Xinyu Fan, Guo Ai et al.
Face detection has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs). Its central issue in recent years is how to improve the detection performance of tiny faces. To this end, many recent works propose some specific strategies, redesign the architecture and introduce new loss functions for tiny object detection. In this report, we start from the popular one-stage RetinaNet approach and apply some recent tricks to obtain a high performance face detector. Specifically, we apply the Intersection over Union (IoU) loss function for regression, employ the two-step classification and regression for detection, revisit the data augmentation based on data-anchor-sampling for training, utilize the max-out operation for classification and use the multi-scale testing strategy for inference. As a consequence, the proposed face detection method achieves state-of-the-art performance on the most popular and challenging face detection benchmark WIDER FACE dataset.