HCAug 24, 2023
Project Aria: A New Tool for Egocentric Multi-Modal AI ResearchJakob Engel, Kiran Somasundaram, Michael Goesele et al. · mit
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
SYApr 6, 2018
Toward Stronger Robustness of Network Controllability: A Snapback Network ModelYang Lou, Lin Wang, Guanrong Chen
A new complex network model, called q-snapback network, is introduced. Basic topological characteristics of the network, such as degree distribution, average path length, clustering coefficient and Pearson correlation coefficient, are evaluated. The typical 4-motifs of the network are simulated. The robustness of both state and structural controllabilities of the network against targeted and random node- and edge-removal attacks, with comparisons to the multiplex congruence network and the generic scale-free network, are presented. It is shown that the q-snapback network has the strongest robustness of controllabilities due to its advantageous inherent structure with many chain- and loop-motifs.
SYMar 20, 2022
A Learning Convolutional Neural Network Approach for Network Robustness PredictionYang Lou, Ruizi Wu, Junli Li et al.
Network robustness is critical for various societal and industrial networks again malicious attacks. In particular, connectivity robustness and controllability robustness reflect how well a networked system can maintain its connectedness and controllability against destructive attacks, which can be quantified by a sequence of values that record the remaining connectivity and controllability of the network after a sequence of node- or edge-removal attacks. Traditionally, robustness is determined by attack simulations, which are computationally very time-consuming or even practically infeasible. In this paper, an improved method for network robustness prediction is developed based on learning feature representation using convolutional neural network (LFR-CNN). In this scheme, higher-dimensional network data are compressed to lower-dimensional representations, and then passed to a CNN to perform robustness prediction. Extensive experimental studies on both synthetic and real-world networks, both directed and undirected, demonstrate that 1) the proposed LFR-CNN performs better than other two state-of-the-art prediction methods, with significantly lower prediction errors; 2) LFR-CNN is insensitive to the variation of the network size, which significantly extends its applicability; 3) although LFR-CNN needs more time to perform feature learning, it can achieve accurate prediction faster than attack simulations; 4) LFR-CNN not only can accurately predict network robustness, but also provides a good indicator for connectivity robustness, better than the classical spectral measures.
CVJul 30, 2023
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous DrivingYang Lou, Qun Song, Qian Xu et al.
Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios.
SYAug 25, 2022
CNN-based Prediction of Network Robustness With Missing EdgesChengpei Wu, Yang Lou, Ruizi Wu et al.
Connectivity and controllability of a complex network are two important issues that guarantee a networked system to function. Robustness of connectivity and controllability guarantees the system to function properly and stably under various malicious attacks. Evaluating network robustness using attack simulations is time consuming, while the convolutional neural network (CNN)-based prediction approach provides a cost-efficient method to approximate the network robustness. In this paper, we investigate the performance of CNN-based approaches for connectivity and controllability robustness prediction, when partial network information is missing, namely the adjacency matrix is incomplete. Extensive experimental studies are carried out. A threshold is explored that if a total amount of more than 7.29\% information is lost, the performance of CNN-based prediction will be significantly degenerated for all cases in the experiments. Two scenarios of missing edge representations are compared, 1) a missing edge is marked `no edge' in the input for prediction, and 2) a missing edge is denoted using a special marker of `unknown'. Experimental results reveal that the first representation is misleading to the CNN-based predictors.
CVDec 12, 2025
SATMapTR: Satellite Image Enhanced Online HD Map ConstructionBingyuan Huang, Guanyi Zhao, Qian Xu et al.
High-definition (HD) maps are evolving from pre-annotated to real-time construction to better support autonomous driving in diverse scenarios. However, this process is hindered by low-quality input data caused by onboard sensors limited capability and frequent occlusions, leading to incomplete, noisy, or missing data, and thus reduced mapping accuracy and robustness. Recent efforts have introduced satellite images as auxiliary input, offering a stable, wide-area view to complement the limited ego perspective. However, satellite images in Bird's Eye View are often degraded by shadows and occlusions from vegetation and buildings. Prior methods using basic feature extraction and fusion remain ineffective. To address these challenges, we propose SATMapTR, a novel online map construction model that effectively fuses satellite image through two key components: (1) a gated feature refinement module that adaptively filters satellite image features by integrating high-level semantics with low-level structural cues to extract high signal-to-noise ratio map-relevant representations; and (2) a geometry-aware fusion module that consistently fuse satellite and BEV features at a grid-to-grid level, minimizing interference from irrelevant regions and low-quality inputs. Experimental results on the nuScenes dataset show that SATMapTR achieves the highest mean average precision (mAP) of 73.8, outperforming state-of-the-art satellite-enhanced models by up to 14.2 mAP. It also shows lower mAP degradation under adverse weather and sensor failures, and achieves nearly 3 times higher mAP at extended perception ranges.
ROMar 29, 2025
VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization via Vision-Language Models for Autonomous DrivingHaibo Hu, Jiacheng Zuo, Yang Lou et al.
With the widespread adoption and deployment of autonomous driving, handling complex environments has become an unavoidable challenge. Due to the scarcity and diversity of extreme scenario datasets, current autonomous driving models struggle to effectively manage corner cases. This limitation poses a significant safety risk, according to the National Highway Traffic Safety Administration (NHTSA), autonomous vehicle systems have been involved in hundreds of reported crashes annually in the United States, occurred in corner cases like sun glare and fog, which caused a few fatal accident. Furthermore, in order to consistently maintain a robust and reliable autonomous driving system, it is essential for models not only to perform well on routine scenarios but also to adapt to newly emerging scenarios, especially those corner cases that deviate from the norm. This requires a learning mechanism that incrementally integrates new knowledge without degrading previously acquired capabilities. However, to the best of our knowledge, no existing continual learning methods have been proposed to ensure consistent and scalable corner case learning in autonomous driving. To address these limitations, we propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets, and VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases while preserving performance on previously routine scenarios, thus ensuring long-term stability and adaptability in real-world autonomous driving. We evaluate VLM-C4L on large-scale real-world autonomous driving datasets, including Waymo and the corner case dataset CODA.
CRJun 17, 2024
A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous DrivingYang Lou, Yi Zhu, Qun Song et al.
Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed.
LGMay 13, 2023
SPP-CNN: An Efficient Framework for Network Robustness PredictionChengpei Wu, Yang Lou, Lin Wang et al.
This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops an efficient framework for network robustness prediction, the spatial pyramid pooling convolutional neural network (SPP-CNN). The new framework installs a spatial pyramid pooling layer between the convolutional and fully-connected layers, overcoming the common mismatch issue in the CNN-based prediction approaches and extending its generalizability. Extensive experiments are carried out by comparing SPP-CNN with three state-of-the-art robustness predictors, namely a CNN-based and two graph neural networks-based frameworks. Synthetic and real-world networks, both directed and undirected, are investigated. Experimental results demonstrate that the proposed SPP-CNN achieves better prediction performances and better generalizability to unknown datasets, with significantly lower time-consumption, than its counterparts.
CVAug 6, 2021
Evaluating Adversarial Attacks on Driving Safety in Vision-Based Autonomous VehiclesJindi Zhang, Yang Lou, Jianping Wang et al.
In recent years, many deep learning models have been adopted in autonomous driving. At the same time, these models introduce new vulnerabilities that may compromise the safety of autonomous vehicles. Specifically, recent studies have demonstrated that adversarial attacks can cause a significant decline in detection precision of deep learning-based 3D object detection models. Although driving safety is the ultimate concern for autonomous driving, there is no comprehensive study on the linkage between the performance of deep learning models and the driving safety of autonomous vehicles under adversarial attacks. In this paper, we investigate the impact of two primary types of adversarial attacks, perturbation attacks and patch attacks, on the driving safety of vision-based autonomous vehicles rather than the detection precision of deep learning models. In particular, we consider two state-of-the-art models in vision-based 3D object detection, Stereo R-CNN and DSGN. To evaluate driving safety, we propose an end-to-end evaluation framework with a set of driving safety performance metrics. By analyzing the results of our extensive evaluation experiments, we find that (1) the attack's impact on the driving safety of autonomous vehicles and the attack's impact on the precision of 3D object detectors are decoupled, and (2) the DSGN model demonstrates stronger robustness to adversarial attacks than the Stereo R-CNN model. In addition, we further investigate the causes behind the two findings with an ablation study. The findings of this paper provide a new perspective to evaluate adversarial attacks and guide the selection of deep learning models in autonomous driving.
NEJan 3, 2021
Computing Cliques and Cavities in NetworksDinghua Shi, Zhifeng Chen, Xiang Sun et al.
Complex networks contain complete subgraphs such as nodes, edges, triangles, etc., referred to as simplices and cliques of different orders. Notably, cavities consisting of higher-order cliques play an important role in brain functions. Since searching for maximum cliques is an NP-complete problem, we use k-core decomposition to determine the computability of a given network. For a computable network, we design a search method with an implementable algorithm for finding cliques of different orders, obtaining also the Euler characteristic number. Then, we compute the Betti numbers by using the ranks of boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans with data from one typical dataset, and find all of its cliques and some cavities of different orders, providing a basis for further mathematical analysis and computation of its structure and function.
SYAug 26, 2019
Predicting Network Controllability Robustness: A Convolutional Neural Network ApproachYang Lou, Yaodong He, Lin Wang et al.
Network controllability measures how well a networked system can be controlled to a target state, and its robustness reflects how well the system can maintain the controllability against malicious attacks by means of node-removals or edge-removals. The measure of network controllability is quantified by the number of external control inputs needed to recover or to retain the controllability after the occurrence of an unexpected attack. The measure of the network controllability robustness, on the other hand, is quantified by a sequence of values that record the remaining controllability of the network after a sequence of attacks. Traditionally, the controllability robustness is determined by attack simulations, which is computationally time consuming. In this paper, a method to predict the controllability robustness based on machine learning using a convolutional neural network is proposed, motivated by the observations that 1) there is no clear correlation between the topological features and the controllability robustness of a general network, 2) the adjacency matrix of a network can be regarded as a gray-scale image, and 3) the convolutional neural network technique has proved successful in image processing without human intervention. Under the new framework, a fairly large number of training data generated by simulations are used to train a convolutional neural network for predicting the controllability robustness according to the input network-adjacency matrices, without performing conventional attack simulations. Extensive experimental studies were carried out, which demonstrate that the proposed framework for predicting controllability robustness of different network configurations is accurate and reliable with very low overheads.
CVMay 14, 2019
Reconstruction-Aware Imaging System Ranking by use of a Sparsity-Driven Numerical Observer Enabled by Variational Bayesian InferenceYujia Chen, Yang Lou, Kun Wang et al.
It is widely accepted that optimization of imaging system performance should be guided by task-based measures of image quality (IQ). It has been advocated that imaging hardware or data-acquisition designs should be optimized by use of an ideal observer (IO) that exploits full statistical knowledge of the measurement noise and class of objects to be imaged, without consideration of the reconstruction method. In practice, accurate and tractable models of the complete object statistics are often difficult to determine. Moreover, in imaging systems that employ compressive sensing concepts, imaging hardware and sparse image reconstruction are innately coupled technologies. In this work, a sparsity-driven observer (SDO) that can be employed to optimize hardware by use of a stochastic object model describing object sparsity is described and investigated. The SDO and sparse reconstruction method can therefore be "matched" in the sense that they both utilize the same statistical information regarding the class of objects to be imaged. To efficiently compute the SDO test statistic, computational tools developed recently for variational Bayesian inference with sparse linear models are adopted. The use of the SDO to rank data-acquisition designs in a stylized example as motivated by magnetic resonance imaging (MRI) is demonstrated. This study reveals that the SDO can produce rankings that are consistent with visual assessments of the reconstructed images but different from those produced by use of the traditionally employed Hotelling observer (HO).
NEMar 16, 2019
On-line Search History-assisted Restart Strategy for Covariance Matrix Adaptation Evolution StrategyYang Lou, Shiu Yin Yuen, Guanrong Chen et al.
Restart strategy helps the covariance matrix adaptation evolution strategy (CMA-ES) to increase the probability of finding the global optimum in optimization, while a single run CMA-ES is easy to be trapped in local optima. In this paper, the continuous non-revisiting genetic algorithm (cNrGA) is used to help CMA-ES to achieve multiple restarts from different sub-regions of the search space. The CMA-ES with on-line search history-assisted restart strategy (HR-CMA-ES) is proposed. The entire on-line search history of cNrGA is stored in a binary space partitioning (BSP) tree, which is effective for performing local search. The frequently sampled sub-region is reflected by a deep position in the BSP tree. When leaf nodes are located deeper than a threshold, the corresponding sub-region is considered a region of interest (ROI). In HR-CMA-ES, cNrGA is responsible for global exploration and suggesting ROI for CMA-ES to perform an exploitation within or around the ROI. CMA-ES restarts independently in each suggested ROI. The non-revisiting mechanism of cNrGA avoids to suggest the same ROI for a second time. Experimental results on the CEC 2013 and 2017 benchmark suites show that HR-CMA-ES performs better than both CMA-ES and cNrGA. A positive synergy is observed by the memetic cooperation of the two algorithms.
SIMay 20, 2016
Local communities obstruct global consensus: Naming game on multi-local-world networksYang Lou, Guanrong Chen, Zhengping Fan et al.
Community structure is essential for social communications, where individuals belonging to the same community are much more actively interacting and communicating with each other than those in different communities within the human society. Naming game, on the other hand, is a social communication model that simulates the process of learning a name of an object within a community of humans, where the individuals can generally reach global consensus asymptotically through iterative pair-wise conversations. The underlying network indicates the relationships among the individuals. In this paper, three typical topologies, namely random-graph, small-world and scale-free networks, are employed, which are embedded with the multi-local-world community structure, to study the naming game. Simulations show that 1) the convergence process to global consensus is getting slower as the community structure becomes more prominent, and eventually might fail; 2) if the inter-community connections are sufficiently dense, neither the number nor the size of the communities affects the convergence process; and 3) for different topologies with the same average node-degree, local clustering of individuals obstruct or prohibit global consensus to take place. The results reveal the role of local communities in a global naming game in social network studies.
CLDec 28, 2015
Communicating with sentences: A multi-word naming game modelYang Lou, Guanrong Chen, Jianwei Hu
Naming game simulates the process of naming an object by a single word, in which a population of communicating agents can reach global consensus asymptotically through iteratively pair-wise conversations. We propose an extension of the single-word model to a multi-word naming game (MWNG), simulating the case of describing a complex object by a sentence (multiple words). Words are defined in categories, and then organized as sentences by combining them from different categories. We refer to a formatted combination of several words as a pattern. In such an MWNG, through a pair-wise conversation, it requires the hearer to achieve consensus with the speaker with respect to both every single word in the sentence as well as the sentence pattern, so as to guarantee the correct meaning of the saying, otherwise, they fail reaching consensus in the interaction. We validate the model in three typical topologies as the underlying communication network, and employ both conventional and man-designed patterns in performing the MWNG.