CVMar 12, 2024Code
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense PredictionsChunlong Xia, Xinliang Wang, Feng Lv et al.
Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT backbone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://github.com/Traffic-X/ViT-CoMer.
LGJul 19, 2024Code
Gaussian Process Model with Tensorial Inputs and Its Application to the Design of 3D Printed AntennasXi Chen, Yashika Sharma, Hao Helen Zhang et al.
In simulation-based engineering design with time-consuming simulators, Gaussian process (GP) models are widely used as fast emulators to speed up the design optimization process. In its most commonly used form, the input of GP is a simple list of design parameters. With rapid development of additive manufacturing (also known as 3D printing), design inputs with 2D/3D spatial information become prevalent in some applications, for example, neighboring relations between pixels/voxels and material distributions in heterogeneous materials. Such spatial information, vital to 3D printed designs, is hard to incorporate into existing GP models with common kernels such as squared exponential or Matérn. In this work, we propose to embed a generalized distance measure into a GP kernel, offering a novel and convenient technique to incorporate spatial information from freeform 3D printed designs into the GP framework. The proposed method allows complex design problems for 3D printed objects to take advantage of a plethora of tools available from the GP surrogate-based simulation optimization such as designed experiments and GP-based optimizations including Bayesian optimization. We investigate the properties of the proposed method and illustrate its performance by several numerical examples of 3D printed antennas. The dataset is publicly available at: https://github.com/xichennn/GP_dataset.
CVJul 22, 2024
Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object DetectionYiran Yang, Xu Gao, Tong Wang et al.
Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations to enhance the fusion process. Specifically, we propose a triphase domain aligning module. This module adjusts the feature distributions from both the camera and LiDAR, bringing them closer to the ground truth domain and minimizing differences. Additionally, we explore improved representation acquisition methods for dynamic fusion, which includes modal interaction and specialty enhancement. Finally, an adaptive learning technique that merges the semantics and geometry information for dynamical instance optimization. Extensive experiments in the nuScenes dataset present competitive performance with state-of-the-art approaches. Our code will be released in the future.
CVFeb 22, 2024Code
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level RecognitionLianghui Zhu, Junwei Zhou, Yan Liu et al.
Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM addresses two critical limitations in traditional WSOD retraining, i.e., pseudo ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT generation and Region of Interest (RoI) drop regularization. It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively. The code is available at \url{https://github.com/hustvl/WeakSAM}.
59.4NIMay 18
Enhancing Network Resilience via Graph-Based Anomaly Detection in Sovereign FunctionsXin Hao, Wei Ni, Chenhan Zhang et al.
Sovereign network functions, e.g., routing protocols, are becoming increasingly complex and susceptible to failures arising from protocol configuration anomalies and anomalous configurations. This paper interprets the protocol configuration anomaly detection problem as detection of structural inconsistencies of connected nodes and edges in a bipartite graph that captures both physical network entities and logical protocol states. This graph structural inconsistency detector (GSID) model is proposed to solve the problem efficiently. To handle the heterogeneous nature of protocol configuration parameters, GSID employs an adaptive configuration encoder (ACE) that dynamically selects encoding strategies per parameter to preserve fine-grained numerical discrepancies. To expose the subtle inconsistencies of connected nodes and edges in the bipartite graph, GSID uses an inconsistency dynamic attention (IDA) mechanism that scores edges by drawing asymmetric attentions from both ends, rule compliance from one end and route connectivity from the other. It is demonstrated experimentally that GSID outperforms state-of-the-art baselines by threefold in F1 score and by 23.2% in accuracy. Ablation studies validate the effectiveness of both the ACE and IDA modules. Tests on unseen network scales and real-world network topologies show the superior adaptability of our GSID, compared to the baselines.
61.0NIMay 19
Sample-Efficient Misconfiguration Classification for Network Resilience in Wireless CommunicationsXin Hao, Chenhan Zhang, Massimo Piccardi et al.
As modern wireless communication networks grow increasingly complex, network outages driven by the inconsistency between dynamic topologies and protocol configurations have become a critical concern. To solve this issue, we mathematically formulate a protocol misconfiguration classification problem as a graph-based learning task and solve it with our proposed EtaGATv2 algorithm, an edge-type-aware graph attention network with dynamic attention. EtaGATv2 addresses two critical challenges: i) it captures non-uniform symptom propagation for protocol misconfiguration classification tasks, where certain network paths and nodes become critical for diagnosis, and ii) it extracts protocol-specific features from heterogeneous routing protocols with distinct message-passing behaviors by utilizing edge-type-aware transformations. Experiments across diverse and real-world topologies demonstrate that EtaGATv2 reaches state-of-the-art performance with 50% of the training samples, making it particularly suitable for networks with dynamic topologies and limited negative-labeled data.
AIAug 15, 2024
BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI CoordinationXin Hao, Bahareh Nakisa, Mohmmad Naim Rastgoo et al.
Deep reinforcement Learning (DRL) offers a powerful framework for training AI agents to coordinate with human partners. However, DRL faces two critical challenges in human-AI coordination (HAIC): sparse rewards and unpredictable human behaviors. These challenges significantly limit DRL to identify effective coordination policies, due to its impaired capability of optimizing exploration and exploitation. To address these limitations, we propose an innovative behavior- and context-aware reward (BCR) for DRL, which optimizes exploration and exploitation by leveraging human behaviors and contextual information in HAIC. Our BCR consists of two components: (i) A novel dual intrinsic rewarding scheme to enhance exploration. This scheme composes an AI self-motivated intrinsic reward and a human-motivated intrinsic reward, which are designed to increase the capture of sparse rewards by a logarithmic-based strategy; and (ii) A new context-aware weighting mechanism for the designed rewards to improve exploitation. This mechanism helps the AI agent prioritize actions that better coordinate with the human partner by utilizing contextual information that can reflect the evolution of learning. Extensive simulations in the Overcooked environment demonstrate that our approach can increase the cumulative sparse rewards by approximately 20%, and improve the sample efficiency by around 38% compared to state-of-the-art baselines.
CVMay 10, 2023Code
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and ForecastingHaibao Yu, Wenxian Yang, Hongzhi Ruan et al.
Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frames, trajectories, vector maps, and traffic lights captured from natural scenery. V2X-Seq comprises two parts: the sequential perception dataset, which includes more than 15,000 frames captured from 95 scenarios, and the trajectory forecasting dataset, which contains about 80,000 infrastructure-view scenarios, 80,000 vehicle-view scenarios, and 50,000 cooperative-view scenarios captured from 28 intersections' areas, covering 672 hours of data. Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. We also provide benchmarks for the introduced tasks. Find data, code, and more up-to-date information at \href{https://github.com/AIR-THU/DAIR-V2X-Seq}{https://github.com/AIR-THU/DAIR-V2X-Seq}.
LGOct 3, 2021Code
Simple Recurrent Neural Networks is all we need for clinical events predictions using EHR dataLaila Rasmy, Jie Zhu, Zhiheng Li et al.
Recently, there is great interest to investigate the application of deep learning models for the prediction of clinical events using electronic health records (EHR) data. In EHR data, a patient's history is often represented as a sequence of visits, and each visit contains multiple events. As a result, deep learning models developed for sequence modeling, like recurrent neural networks (RNNs) are common architecture for EHR-based clinical events predictive models. While a large variety of RNN models were proposed in the literature, it is unclear if complex architecture innovations will offer superior predictive performance. In order to move this field forward, a rigorous evaluation of various methods is needed. In this study, we conducted a thorough benchmark of RNN architectures in modeling EHR data. We used two prediction tasks: the risk for developing heart failure and the risk of early readmission for inpatient hospitalization. We found that simple gated RNN models, including GRUs and LSTMs, often offer competitive results when properly tuned with Bayesian Optimization, which is in line with similar to findings in the natural language processing (NLP) domain. For reproducibility, Our codebase is shared at https://github.com/ZhiGroup/pytorch_ehr.
LGDec 13, 2023
Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC NetworksXin Hao, Phee Lep Yeoh, Changyang She et al.
This paper proposes a blockchain-secured deep reinforcement learning (BC-DRL) optimization framework for {data management and} resource allocation in decentralized {wireless mobile edge computing (MEC)} networks. In our framework, {we design a low-latency reputation-based proof-of-stake (RPoS) consensus protocol to select highly reliable blockchain-enabled BSs to securely store MEC user requests and prevent data tampering attacks.} {We formulate the MEC resource allocation optimization as a constrained Markov decision process that balances minimum processing latency and denial-of-service (DoS) probability}. {We use the MEC aggregated features as the DRL input to significantly reduce the high-dimensionality input of the remaining service processing time for individual MEC requests. Our designed constrained DRL effectively attains the optimal resource allocations that are adapted to the dynamic DoS requirements. We provide extensive simulation results and analysis to} validate that our BC-DRL framework achieves higher security, reliability, and resource utilization efficiency than benchmark blockchain consensus protocols and {MEC} resource allocation algorithms.
ITDec 13, 2023
Graph Neural Network-Based Bandwidth Allocation for Secure Wireless CommunicationsXin Hao, Phee Lep Yeoh, Yuhong Liu et al.
This paper designs a graph neural network (GNN) to improve bandwidth allocations for multiple legitimate wireless users transmitting to a base station in the presence of an eavesdropper. To improve the privacy and prevent eavesdropping attacks, we propose a user scheduling algorithm to schedule users satisfying an instantaneous minimum secrecy rate constraint. Based on this, we optimize the bandwidth allocations with three algorithms namely iterative search (IvS), GNN-based supervised learning (GNN-SL), and GNN-based unsupervised learning (GNN-USL). We present a computational complexity analysis which shows that GNN-SL and GNN-USL can be more efficient compared to IvS which is limited by the bandwidth block size. Numerical simulation results highlight that our proposed GNN-based resource allocations can achieve a comparable sum secrecy rate compared to IvS with significantly lower computational complexity. Furthermore, we observe that the GNN approach is more robust to uncertainties in the eavesdropper's channel state information, especially compared with the best channel allocation scheme.
NIDec 23, 2023
Hybrid-Task Meta-Learning: A GNN Approach for Scalable and Transferable Bandwidth AllocationXin Hao, Changyang She, Phee Lep Yeoh et al.
In this paper, we develop a deep learning-based bandwidth allocation policy that is: 1) scalable with the number of users and 2) transferable to different communication scenarios, such as non-stationary wireless channels, different quality-of-service (QoS) requirements, and dynamically available resources. To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN), with which the number of training parameters does not change with the number of users. To enable the generalization of the GNN, we develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios during meta-training. Next, during meta-testing, a few samples are used to fine-tune the GNN with unseen communication scenarios. Simulation results demonstrate that our HML approach can improve the initial performance by 8.79%, and sample efficiency by 73%, compared with existing benchmarks. After fine-tuning, our near-optimal GNN-based policy can achieve close to the same reward with much lower inference complexity compared to the optimal policy obtained using iterative optimization. Numerical results validate that our HML can reduce the computation time by approximately 200 to 2000 times than the optimal iterative algorithm.