Zili Shao

CR
h-index14
5papers
4citations
Novelty53%
AI Score45

5 Papers

93.4NEMay 8
Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts

Zixuan Huang, Da Chen, Kecheng Huang et al.

Generating high-performance GPU kernels remains challenging due to the need for both correctness and hardware-aware optimization. While large language models (LLMs) show promise in code generation, they often fail to produce kernels that are both correct and efficient. We propose Kernel Foundry, a diagnosis-driven evolutionary framework for automatic GPU kernel optimization. Our method combines expert-guided, retrieval-augmented initialization with a multi-island evolutionary search, where candidate kernels are iteratively refined using structured diagnostic feedback. A centralized experience library accumulates reusable optimization knowledge to guide subsequent evolution, while explicit mechanisms prevent cheating behaviors that bypass kernel-level computation. Experiments on KernelBench show that our method consistently improves both correctness and performance over strong baselines, achieving up to 100% correctness on Level~2.

11.5CVMay 20
SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

Wanying Tan, Shuo Yan, Dazhi Huang et al.

Interpretability in object detection provides crucial confidence support for clinical auxiliary diagnosis. However, in tiny bacteria detection, traditional explanation methods often suffer from blurred foreground boundaries and diffuse feature attribution due to the extreme sparsity of target morphological features and severe interference from complex backgrounds. Such limitations hinder the provision of logically coherent morphological evidence. To bridge this gap, we propose a novel eXplainable AI (XAI) framework, SAM-Sode. The framework innovatively transforms initial feature attribution maps into geometry-aware prompts, leveraging the prior knowledge of the foundation model (SAM3) to achieve spatial refinement and morphological reconstruction of the explanatory mappings. Furthermore, we introduce a dual-constraint mechanism based on physical significance and geometric alignment to perform instance-level denoising, generating coherent explanations that better align with human expert intuition. Experimental results on our self-constructed bacteria dataset with complex circuit backgrounds (containing 2,524 images) and other public datasets demonstrate that the proposed method effectively suppresses background redundancy and significantly enhances the decision-making transparency of tiny object detection.

DCDec 1, 2025
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity

Wenbin Zhu, Zhaoyan Shen, Zili Shao et al.

Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing. However, cold-start latency, especially the model loading phase, has become a critical performance bottleneck, as it scales linearly with model size and severely limits the practical deployment of large-scale LLM services. This paper presents Tangram, a novel system that accelerates Serverless LLM loading through efficient GPU memory reuse. By leveraging the unused GPU memory to retain model parameters, Tangram significantly reduces model transfer time and cold-start latency. Its design includes three key components: unified GPU memory pool for tensor-level parameter sharing across models, on-demand KV cache allocation for dynamic memory management, and GPU-affinity-aware scheduling for maximizing resource utilization. These techniques collectively address the critical challenges of inefficient memory usage and the cold-start problem in Serverless LLM platforms. We have implemented a fully functional prototype, and experiments show that Tangram achieves up to 6.2 times faster loading and reduces Time-To-First-Token (TTFT) during cold-start by 23--55% over state-of-the-art methods.

LGJul 9, 2025
Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak

Zixuan Huang, Kecheng Huang, Lihao Yin et al.

Large Language Models (LLMs) have gained significant traction in various applications, yet their capabilities present risks for both constructive and malicious exploitation. Despite extensive training and fine-tuning efforts aimed at enhancing safety, LLMs remain susceptible to jailbreak attacks. Recently, the emergence of multi-turn attacks has intensified this vulnerability. Unlike single-turn attacks, multi-turn attacks incrementally escalate dialogue complexity, rendering them more challenging to detect and mitigate. In this study, we introduce G-Guard, an innovative attention-aware Graph Neural Network (GNN)-based input classifier specifically designed to defend against multi-turn jailbreak attacks targeting LLMs. G-Guard constructs an entity graph for multi-turn queries, which captures the interrelationships between queries and harmful keywords that present in multi-turn queries. Furthermore, we propose an attention-aware augmentation mechanism that retrieves the most relevant single-turn query based on the ongoing multi-turn conversation. The retrieved query is incorporated as a labeled node within the graph, thereby enhancing the GNN's capacity to classify the current query as harmful or benign. Evaluation results show that G-Guard consistently outperforms all baselines across diverse datasets and evaluation metrics, demonstrating its efficacy as a robust defense mechanism against multi-turn jailbreak attacks.

CRNov 25, 2021
A Survey of Blockchain Data Management Systems

Qian Wei, Bingzhe Li, Wanli Chang et al.

Blockchain has been widely deployed in various sectors, such as finance, education, and public services. Since blockchain runs as an immutable distributed ledger, it has decentralized mechanisms with persistency, anonymity, and auditability, where transactions are jointly performed through cryptocurrency-based consensus algorithms by worldwide distributed nodes. There have been many survey papers reviewing the blockchain technologies from different perspectives, e.g., digital currencies, consensus algorithms, and smart contracts. However, none of them have focused on the blockchain data management systems. To fill in this gap, we have conducted a comprehensive survey on the data management systems, based on three typical types of blockchain, i.e., standard blockchain, hybrid blockchain, and DAG (Directed Acyclic Graph)-based blockchain. We categorize their data management mechanisms into three layers: blockchain architecture, blockchain data structure, and blockchain storage engine, where block architecture indicates how to record transactions on a distributed ledger, blockchain data structure refers to the internal structure of each block, and blockchain storage engine specifies the storage form of data on the blockchain system. For each layer, the works advancing the state-of-the-art are discussed together with technical challenges. Furthermore, we lay out the future research directions for the blockchain data management systems.