SYOct 31, 2015
Controllability of networked MIMO systemsLin Wang, Guanrong Chen, Xiaofan Wang et al.
In this paper, we consider the state controllability of networked systems, where the network topology is directed and weighted and the nodes are higher-dimensional linear time-invariant (LTI) dynamical systems. We investigate how the network topology, the node-system dynamics, the external control inputs, and the inner interactions affect the controllability of a networked system, and show that for a general networked multi-input/multi-output (MIMO) system: 1) the controllability of the overall network is an integrated result of the aforementioned relevant factors, which cannot be decoupled into the controllability of individual node-systems and the properties solely determined by the network topology, quite different from the familiar notion of consensus or formation controllability; 2) if the network topology is uncontrollable by external inputs, then the networked system with identical nodes will be uncontrollable, even if it is structurally controllable; 3) with a controllable network topology, controllability and observability of the nodes together are necessary for the controllability of the networked systems under some mild conditions, but nevertheless they are not sufficient. For a networked system with single-input/single-output (SISO) LTI nodes, we present precise necessary and sufficient conditions for the controllability of a general network topology.
GTMar 3, 2016
Feedback Control of Real-Time Display AdvertisingWeinan Zhang, Yifei Rong, Jun Wang et al.
Real-Time Bidding (RTB) is revolutionising display advertising by facilitating per-impression auctions to buy ad impressions as they are being generated. Being able to use impression-level data, such as user cookies, encourages user behaviour targeting, and hence has significantly improved the effectiveness of ad campaigns. However, a fundamental drawback of RTB is its instability because the bid decision is made per impression and there are enormous fluctuations in campaigns' key performance indicators (KPIs). As such, advertisers face great difficulty in controlling their campaign performance against the associated costs. In this paper, we propose a feedback control mechanism for RTB which helps advertisers dynamically adjust the bids to effectively control the KPIs, e.g., the auction winning ratio and the effective cost per click. We further formulate an optimisation framework to show that the proposed feedback control mechanism also has the ability of optimising campaign performance. By settling the effective cost per click at an optimal reference value, the number of campaign's ad clicks can be maximised with the budget constraint. Our empirical study based on real-world data verifies the effectiveness and robustness of our RTB control system in various situations. The proposed feedback control mechanism has also been deployed on a commercial RTB platform and the online test has shown its success in generating controllable advertising performance.
SPAug 29, 2024Code
Mirror contrastive loss based sliding window transformer for subject-independent motor imagery based EEG signal recognitionJing Luo, Qi Mao, Weiwei Shi et al.
While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Transformer (MCL-SWT) to enhance subject-independent motor imagery-based EEG signal recognition. Specifically, our proposed mirror contrastive loss enhances sensitivity to the spatial location of ERD by contrasting the original EEG signals with their mirror counterparts-mirror EEG signals generated by interchanging the channels of the left and right hemispheres of the EEG signals. Moreover, we introduce a temporal sliding window transformer that computes self-attention scores from high temporal resolution features, thereby improving model performance with manageable computational complexity. We evaluate the performance of MCL-SWT on subject-independent motor imagery EEG signal recognition tasks, and our experimental results demonstrate that MCL-SWT achieved accuracies of 66.48% and 75.62%, surpassing the state-of-the-art (SOTA) model by 2.82% and 2.17%, respectively. Furthermore, ablation experiments confirm the effectiveness of the proposed mirror contrastive loss. A code demo of MCL-SWT is available at https://github.com/roniusLuo/MCL_SWT.
MENov 7, 2015
Distributed Detection via Bayesian Updates and ConsensusQipeng Liu, Jiuhua Zhao, Xiaofan Wang
In this paper, we discuss a class of distributed detection algorithms which can be viewed as implementations of Bayes' law in distributed settings. Some of the algorithms are proposed in the literature most recently, and others are first developed in this paper. The common feature of these algorithms is that they all combine (i) certain kinds of consensus protocols with (ii) Bayesian updates. They are different mainly in the aspect of the type of consensus protocol and the order of the two operations. After discussing their similarities and differences, we compare these distributed algorithms by numerical examples. We focus on the rate at which these algorithms detect the underlying true state of an object. We find that (a) The algorithms with consensus via geometric average is more efficient than that via arithmetic average; (b) The order of consensus aggregation and Bayesian update does not apparently influence the performance of the algorithms; (c) The existence of communication delay dramatically slows down the rate of convergence; (d) More communication between agents with different signal structures improves the rate of convergence.
LGMay 8Code
Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear ActivationsJun Zhang, Haibo Zhang, Chun Liu et al.
Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like \(\tanh\). Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting scalability. We propose LightCROWN, which computes tighter Jacobian bounds by exploiting the analytical properties of activation functions. Experiments on nonlinear control systems including the inverted pendulum, Dubins car, and planar quadrotor demonstrate that LightCROWN improves verification success rates up to 100\%, while enhancing speed and scalability. Our approach provides a generalizable improvement for CROWN-based frameworks, enabling more efficient verification of complex NCBFs. The code can be found at github.com/Autonomous-Systems-and-Control-Lab/verify-neural-CBF.
LGNov 19, 2022
Autoregressive GNN-ODE GRU Model for Network DynamicsBo Liang, Lin Wang, Xiaofan Wang
Revealing the continuous dynamics on the networks is essential for understanding, predicting, and even controlling complex systems, but it is hard to learn and model the continuous network dynamics because of complex and unknown governing equations, high dimensions of complex systems, and unsatisfactory observations. Moreover, in real cases, observed time-series data are usually non-uniform and sparse, which also causes serious challenges. In this paper, we propose an Autoregressive GNN-ODE GRU Model (AGOG) to learn and capture the continuous network dynamics and realize predictions of node states at an arbitrary time in a data-driven manner. The GNN module is used to model complicated and nonlinear network dynamics. The hidden state of node states is specified by the ODE system, and the augmented ODE system is utilized to map the GNN into the continuous time domain. The hidden state is updated through GRUCell by observations. As prior knowledge, the true observations at the same timestamp are combined with the hidden states for the next prediction. We use the autoregressive model to make a one-step ahead prediction based on observation history. The prediction is achieved by solving an initial-value problem for ODE. To verify the performance of our model, we visualize the learned dynamics and test them in three tasks: interpolation reconstruction, extrapolation prediction, and regular sequences prediction. The results demonstrate that our model can capture the continuous dynamic process of complex systems accurately and make precise predictions of node states with minimal error. Our model can consistently outperform other baselines or achieve comparable performance.
IVJan 16
Bridging Modalities: Joint Synthesis and Registration Framework for Aligning Diffusion MRI with T1-Weighted ImagesXiaofan Wang, Junyi Wang, Yuqian Chen et al.
Multimodal image registration between diffusion MRI (dMRI) and T1-weighted (T1w) MRI images is a critical step for aligning diffusion-weighted imaging (DWI) data with structural anatomical space. Traditional registration methods often struggle to ensure accuracy due to the large intensity differences between diffusion data and high-resolution anatomical structures. This paper proposes an unsupervised registration framework based on a generative registration network, which transforms the original multimodal registration problem between b0 and T1w images into a unimodal registration task between a generated image and the real T1w image. This effectively reduces the complexity of cross-modal registration. The framework first employs an image synthesis model to generate images with T1w-like contrast, and then learns a deformation field from the generated image to the fixed T1w image. The registration network jointly optimizes local structural similarity and cross-modal statistical dependency to improve deformation estimation accuracy. Experiments conducted on two independent datasets demonstrate that the proposed method outperforms several state-of-the-art approaches in multimodal registration tasks.
CVMay 10, 2024Code
Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage CT Image ClassificationJialiang Fan, Xinhui Fan, Chengyan Song et al.
Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we collect a dataset from the real world for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a neural network structure, dual-task vision transformer (DTViT), for the automated classification and diagnosis of ICH images. The DTViT deploys the encoder from the Vision Transformer (ViT), employing attention mechanisms for feature extraction from CT images. The proposed DTViT framework also incorporates two multilayer perception (MLP)-based decoders to simultaneously identify the presence of ICH and classify the three types of hemorrhage locations. Experimental results demonstrate that DTViT performs well on the real-world test dataset. The code and newly collected dataset for this work are available at: https://github.com/jfan1997/DTViT.
RONov 29, 2024
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic ManipulationQixiu Li, Yaobo Liang, Zeyu Wang et al.
The advancement of large Vision-Language-Action (VLA) models has significantly improved robotic manipulation in terms of language-guided task execution and generalization to unseen scenarios. While existing VLAs adapted from pretrained large Vision-Language-Models (VLM) have demonstrated promising generalizability, their task performance is still unsatisfactory as indicated by the low tasks success rates in different environments. In this paper, we present a new advanced VLA architecture derived from VLM. Unlike previous works that directly repurpose VLM for action prediction by simple action quantization, we propose a omponentized VLA architecture that has a specialized action module conditioned on VLM output. We systematically study the design of the action module and demonstrates the strong performance enhancement with diffusion action transformers for action sequence modeling, as well as their favorable scaling behaviors. We also conduct comprehensive experiments and ablation studies to evaluate the efficacy of our models with varied designs. The evaluation on 5 robot embodiments in simulation and real work shows that our model not only significantly surpasses existing VLAs in task performance and but also exhibits remarkable adaptation to new robots and generalization to unseen objects and backgrounds. It exceeds the average success rates of OpenVLA which has similar model size (7B) with ours by over 35% in simulated evaluation and 55% in real robot experiments. It also outperforms the large RT-2-X model (55B) by 18% absolute success rates in simulation. Code and models can be found on our project page (https://cogact.github.io/).
CVJan 14
BrainSegNet: A Novel Framework for Whole-Brain MRI Parcellation Enhanced by Large ModelsYucheng Li, Xiaofan Wang, Junyi Wang et al.
Whole-brain parcellation from MRI is a critical yet challenging task due to the complexity of subdividing the brain into numerous small, irregular shaped regions. Traditionally, template-registration methods were used, but recent advances have shifted to deep learning for faster workflows. While large models like the Segment Anything Model (SAM) offer transferable feature representations, they are not tailored for the high precision required in brain parcellation. To address this, we propose BrainSegNet, a novel framework that adapts SAM for accurate whole-brain parcellation into 95 regions. We enhance SAM by integrating U-Net skip connections and specialized modules into its encoder and decoder, enabling fine-grained anatomical precision. Key components include a hybrid encoder combining U-Net skip connections with SAM's transformer blocks, a multi-scale attention decoder with pyramid pooling for varying-sized structures, and a boundary refinement module to sharpen edges. Experimental results on the Human Connectome Project (HCP) dataset demonstrate that BrainSegNet outperforms several state-of-the-art methods, achieving higher accuracy and robustness in complex, multi-label parcellation.
ROMar 1, 2024
Structured Deep Neural Network-Based Backstepping Trajectory Tracking Control for Lagrangian SystemsJiajun Qian, Liang Xu, Xiaoqiang Ren et al.
Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By properly designing neural network structures, the proposed controller can ensure closed-loop stability for any compatible neural network parameters. In addition, improved control performance can be achieved by further optimizing neural network parameters. Besides, we provide explicit upper bounds on tracking errors in terms of controller parameters, which allows us to achieve the desired tracking performance by properly selecting the controller parameters. Furthermore, when system models are unknown, we propose an improved Lagrangian neural network (LNN) structure to learn the system dynamics and design the controller. We show that in the presence of model approximation errors and external disturbances, the closed-loop stability and tracking control performance can still be guaranteed. The effectiveness of the proposed approach is demonstrated through simulations.
RONov 14, 2024
DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle TestingJunjie Zhou, Lin Wang, Qiang Meng et al.
Generating realistic and diverse road scenarios is essential for autonomous vehicle testing and validation. Nevertheless, owing to the complexity and variability of real-world road environments, creating authentic and varied scenarios for intelligent driving testing is challenging. In this paper, we propose DiffRoad, a novel diffusion model designed to produce controllable and high-fidelity 3D road scenarios. DiffRoad leverages the generative capabilities of diffusion models to synthesize road layouts from white noise through an inverse denoising process, preserving real-world spatial features. To enhance the quality of generated scenarios, we design the Road-UNet architecture, optimizing the balance between backbone and skip connections for high-realism scenario generation. Furthermore, we introduce a road scenario evaluation module that screens adequate and reasonable scenarios for intelligent driving testing using two critical metrics: road continuity and road reasonableness. Experimental results on multiple real-world datasets demonstrate DiffRoad's ability to generate realistic and smooth road structures while maintaining the original distribution. Additionally, the generated scenarios can be fully automated into the OpenDRIVE format, facilitating generalized autonomous vehicle simulation testing. DiffRoad provides a rich and diverse scenario library for large-scale autonomous vehicle testing and offers valuable insights for future infrastructure designs that are better suited for autonomous vehicles.
IVMay 7, 2025
Label-efficient Single Photon Images Classification via Active LearningZili Zhang, Ziting Wen, Yiheng Qiang et al.
Single-photon LiDAR achieves high-precision 3D imaging in extreme environments through quantum-level photon detection technology. Current research primarily focuses on reconstructing 3D scenes from sparse photon events, whereas the semantic interpretation of single-photon images remains underexplored, due to high annotation costs and inefficient labeling strategies. This paper presents the first active learning framework for single-photon image classification. The core contribution is an imaging condition-aware sampling strategy that integrates synthetic augmentation to model variability across imaging conditions. By identifying samples where the model is both uncertain and sensitive to these conditions, the proposed method selectively annotates only the most informative examples. Experiments on both synthetic and real-world datasets show that our approach outperforms all baselines and achieves high classification accuracy with significantly fewer labeled samples. Specifically, our approach achieves 97% accuracy on synthetic single-photon data using only 1.5% labeled samples. On real-world data, we maintain 90.63% accuracy with just 8% labeled samples, which is 4.51% higher than the best-performing baseline. This illustrates that active learning enables the same level of classification performance on single-photon images as on classical images, opening doors to large-scale integration of single-photon data in real-world applications.
SISep 22, 2021
Investigating and Modeling the Dynamics of Long TiesDing Lyu, Yuan Yuan, Lin Wang et al.
Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds.
ROJul 6, 2021
Fast-Learning Grasping and Pre-Grasping via Clutter Quantization and Q-map MaskingDafa Ren, Xiaoqiang Ren, Xiaofan Wang et al.
Grasping objects in cluttered scenarios is a challenging task in robotics. Performing pre-grasp actions such as pushing and shifting to scatter objects is a way to reduce clutter. Based on deep reinforcement learning, we propose a Fast-Learning Grasping (FLG) framework, that can integrate pre-grasping actions along with grasping to pick up objects from cluttered scenarios with reduced real-world training time. We associate rewards for performing moving actions with the change of environmental clutter and utilize a hybrid triggering method, leading to data-efficient learning and synergy. Then we use the output of an extended fully convolutional network as the value function of each pixel point of the workspace and establish an accurate estimation of the grasp probability for each action. We also introduce a mask function as prior knowledge to enable the agents to focus on the accurate pose adjustment to improve the effectiveness of collecting training data and, hence, to learn efficiently. We carry out pre-training of the FLG over simulated environment, and then the learnt model is transferred to the real world with minimal fine-tuning for further learning during actions. Experimental results demonstrate a 94% grasp success rate and the ability to generalize to novel objects. Compared to state-of-the-art approaches in the literature, the proposed FLG framework can achieve similar or higher grasp success rate with lesser amount of training in the real world. Supplementary video is available at https://youtu.be/e04uDLsxfDg.
COMar 1, 2017
Approximate Computational Approaches for Bayesian Sensor Placement in High DimensionsXiao Lin, Asif Chowdhury, Xiaofan Wang et al.
Since the cost of installing and maintaining sensors is usually high, sensor locations are always strategically selected. For those aiming at inferring certain quantities of interest (QoI), it is desirable to explore the dependency between sensor measurements and QoI. One of the most popular metric for the dependency is mutual information which naturally measures how much information about one variable can be obtained given the other. However, computing mutual information is always challenging, and the result is unreliable in high dimension. In this paper, we propose an approach to find an approximate lower bound of mutual information and compute it in a lower dimension. Then, sensors are placed where highest mutual information (lower bound) is achieved and QoI is inferred via Bayes rule given sensor measurements. In addition, Bayesian optimization is introduced to provide a continuous mutual information surface over the domain and thus reduce the number of evaluations. A chemical release accident is simulated where multiple sensors are placed to locate the source of the release. The result shows that the proposed approach is both effective and efficient in inferring QoI.