ARMay 21Code
FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance ValidationChengzhen Meng, Xiuzhuang Chen, Bingcai Sui et al.
The rapid advancement of AI workloads and domain-specific architectures has led to increasingly diverse processor microarchitectures, whose design exploration requires fast and accurate performance validation. However, traditional workflows defer validation process until RTL design and SoC integration are complete, significantly prolonging development and iteration cycle. In this work, we present FASE framework, FPGA-Assisted Syscall Emulation, the first work for adapt syscall emulation on FPGA platforms, enabling complex multi-thread benchmarks to directly run on the processor design without integrating SoC or target OS for early-stage performance validation. FASE introduces three key innovations to address three critical challenges for adapting FPGA-based syscall emulation: (1) only a minimal CPU interface is exposed, with other hardware components untouched, addressing the lack of a unified hardware interface in FPGA systems; (2) a Host-Target Protocol (HTP) is proposed to minimize cross-device data traffic, mitigating the low-bandwidth and high-latency communication between FPGA and host; and (3) a host-side runtime is proposed to remotely handle Linux-style system calls, addressing the challenge of cross-device syscall delegation. Experiments ware conducted on Xilinx FPGA with open-sourced RISC-V SMP processor Rocket. With single-thread CoreMark, FASE introduces less than 1% performance error and achieves over 2000x higher efficiency compared to Proxy Kernel due to FPGA acceleration. With complex OpenMP benchmarks, FASE demonstrates over 96% performance validation accuracy for most single-thread workloads and over 91.5% for most multi-thread workloads compared to full SoC validation, significantly reducing development complexity and time-to-feedback. All components of FASE framework are released as open-source.
ROJun 2
Bridging Predictive Uncertainty and Safe Action: Sample-Conditioned Differentiable Planning for Autonomous DrivingChengzhen Meng, Pei Liu, Zhiyu Huang et al.
Complex, dynamic, and interactive driving environments pose significant challenges for autonomous driving, primarily due to the pervasive uncertainty of surrounding traffic. A fundamental bottleneck in current systems is the disconnect between highly expressive uncertainty modeling and interpretable, safe motion planning. In this paper, we propose a novel sample-conditioned differentiable planning framework that bridges this gap by explicitly incorporating diffusion-generated future trajectories into the optimization process. Rather than compressing predictions into a single deterministic future or relying on black-box end-to-end architectures, our approach leverages a conditional diffusion model to generate a diverse set of plausible future scenarios. Crucially, these samples are directly fed into a differentiable planner, which explicitly mitigates predictive uncertainty via an empirical Conditional Value-at-Risk (CVaR) tail-risk constraint. This allows the planner to optimize a physically interpretable trajectory that is robust to rare yet safety-critical interactions. Furthermore, we introduce a directed graph representation for scene context that yields substantial improvements in both predictive effectiveness and computational efficiency. Validated through extensive open-loop and closed-loop evaluations on the Waymo Open Motion and Argoverse 2 datasets, our framework significantly outperforms state-of-the-art baselines in safety, efficiency, and ride comfort.
CVNov 26, 2023
CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration NetworkYuxuan Xiao, Yao Li, Chengzhen Meng et al.
The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of $0.8751 \mathrm{cm}$ and a mean rotation error of $0.0562 ^{\circ}$ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities.
NIMar 31
Needle in a Haystack: Tracking UAVs from Massive Noise in Real-World 5G-A Base Station DataChengzhen Meng, Chenming He, Yidong Jiang et al.
The potential usage of UAVs in daily life has made monitoring them essential. However, existing systems for monitoring UAVs typically rely on cameras, LiDARs, or radars, whose limited sensing range or high deployment cost hinder large-scale adoption. In response, we develop BSense, the first system that tracks UAVs by leveraging point clouds from commercial 5G-A base stations. The key challenge lies in the dominant number of noise points that closely resemble true UAV points, resulting in a noise-to-UAV ratio over 100:1. Therefore, identifying UAVs from the raw point clouds is like finding a needle in a haystack. To overcome this, we propose a layered framework that filters noise at the point, object, and trajectory levels. At the raw point level, we observe that noise points from different spatial regions exhibit distinguishable and consistent signal fingerprints, which we can model to identify and remove them. At the object level, we design spatial and velocity consistency checks to identify false objects, and further compute confidence scores by aggregating these checks over multiple frames for more reliable discrimination. At the final trajectory level, we propose a Transformer-based network that captures multi-frame motion patterns to filter the few remaining false trajectories. We evaluated BSense on a commercial 5G-A base station deployed in an urban environment. The UAV was instructed to fly along 25 distinct trajectories across 54 cases over 7 days, yielding 155 minutes of data with more than 14,000 frames. On this dataset, our system reduces the number of false detections from an average of 168.05 per frame to 0.04, achieving an average F1 score of 95.56% and a mean localization error of 4.9 m at ranges up to 1,000 m.