DCJan 30
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided TranscompilationZhongzhen Wen, Shudi Shao, Zhong Li et al.
The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertise-intensive. While recent work demonstrates that large language models (LLMs) can generate correct and performant GPU kernels, kernel generation for neural processing units (NPUs) remains largely underexplored due to domain-specific programming models, limited public examples, and sparse documentation. Consequently, directly generating AscendC kernels with LLMs yields extremely low correctness, highlighting a substantial gap between GPU and NPU kernel generation. We present AscendCraft, a DSL-guided approach for automatic AscendC kernel generation. AscendCraft introduces a lightweight DSL that abstracts non-essential complexity while explicitly modeling Ascend-specific execution semantics. Kernels are first generated in the DSL using category-specific expert examples and then transcompiled into AscendC through structured, constraint-driven LLM lowering passes. Evaluated on MultiKernelBench across seven operator categories, AscendCraft achieves 98.1% compilation success and 90.4% functional correctness. Moreover, 46.2% of generated kernels match or exceed PyTorch eager execution performance, demonstrating that DSL-guided transcompilation can enable LLMs to generate both correct and competitive NPU kernels. Beyond benchmarks, AscendCraft further demonstrates its generality by successfully generating two correct kernels for newly proposed mHC architecture, achieving performance that substantially surpasses PyTorch eager execution.
CVMay 25, 2021Code
Deep High-Resolution Representation Learning for Cross-Resolution Person Re-identificationGuoqing Zhang, Yu Ge, Zhicheng Dong et al.
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. We name this problem as Cross-Resolution Person Re-identification which brings a great challenge for matching correctly. In this paper, we propose a Deep High-Resolution Pseudo-Siamese Framework (PS-HRNet) to solve the above problem. Specifically, in order to restore the resolution of low-resolution images and make reasonable use of different channel information of feature maps, we introduce and innovate VDSR module with channel attention (CA) mechanism, named as VDSR-CA. Then we reform the HRNet by designing a novel representation head to extract discriminating features, named as HRNet-ReID. In addition, a pseudo-siamese framework is constructed to reduce the difference of feature distributions between low-resolution images and high-resolution images. The experimental results on five cross-resolution person datasets verify the effectiveness of our proposed approach. Compared with the state-of-the-art methods, our proposed PS-HRNet improves 3.4\%, 6.2\%, 2.5\%,1.1\% and 4.2\% at Rank-1 on MLR-Market-1501, MLR-CUHK03, MLR-VIPeR, MLR-DukeMTMC-reID, and CAVIAR datasets, respectively. Our code is available at \url{https://github.com/zhguoqing}.
SESep 17, 2025
Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum AnalysisYu Ge, Linna Xie, Zhong Li et al.
Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However, failure attribution in MASs - pinpointing the specific agent actions responsible for failures - remains underexplored and labor-intensive, posing significant challenges for debugging and system improvement. To bridge this gap, we propose FAMAS, the first spectrum-based failure attribution approach for MASs, which operates through systematic trajectory replay and abstraction, followed by spectrum analysis.The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure. In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group, to account for the agent activation patterns and the action activation patterns within the execution trajectories of MASs. Through expensive evaluations against 12 baselines on the Who and When benchmark, FAMAS demonstrates superior performance by outperforming all the methods in comparison.
AIMay 5, 2023
Set-Type Belief Propagation with Applications to Poisson Multi-Bernoulli SLAMHyowon Kim, Angel F. García-Fernández, Yu Ge et al.
Belief propagation (BP) is a useful probabilistic inference algorithm for efficiently computing approximate marginal probability densities of random variables. However, in its standard form, BP is only applicable to the vector-type random variables with a fixed and known number of vector elements, while certain applications rely on RFSs with an unknown number of vector elements. In this paper, we develop BP rules for factor graphs defined on sequences of RFSs where each RFS has an unknown number of elements, with the intention of deriving novel inference methods for RFSs. Furthermore, we show that vector-type BP is a special case of set-type BP, where each RFS follows the Bernoulli process. To demonstrate the validity of developed set-type BP, we apply it to the PMB filter for SLAM, which naturally leads to new set-type BP-mapping, SLAM, multi-target tracking, and simultaneous localization and tracking filters. Finally, we explore the relationships between the vector-type BP and the proposed set-type BP PMB-SLAM implementations and show a performance gain of the proposed set-type BP PMB-SLAM filter in comparison with the vector-type BP-SLAM filter.
SPDec 5, 2021
Iterated Posterior Linearization PMB Filter for 5G SLAMYu Ge, Yibo Wu, Fan Jiang et al.
5G millimeter wave (mmWave) signals have inherent geometric connections to the propagation channel and the propagation environment. Thus, they can be used to jointly localize the receiver and map the propagation environment, which is termed as simultaneous localization and mapping (SLAM). One of the most important tasks in the 5G SLAM is to deal with the nonlinearity of the measurement model. To solve this problem, existing 5G SLAM approaches rely on sigma-point or extended Kalman filters, linearizing the measurement function with respect to the prior probability density function (PDF). In this paper, we study the linearization of the measurement function with respect to the posterior PDF, and implement the iterated posterior linearization filter into the Poisson multi-Bernoulli SLAM filter. Simulation results demonstrate the accuracy and precision improvements of the resulting SLAM filter.