64.2AIMay 27
Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel LineagesShuoming Zhang, Qiuchu Yu, Yangyu Zhang et al.
LLM-based agents are increasingly used to generate GPU kernels, but they often know what optimizations to try without knowing when those optimizations are sound. We introduce KLineage, which learns this missing "when" knowledge from expert kernels: instead of relying on forward rollouts, KLineage walks expert implementations backward through validation-gated simplifications and reverses each accepted step into a reusable optimization skill. Each skill records not only the optimization intent, but also where it applies in code, what conditions made it valid, what effect it had, and what failures its assumptions avoid. A downstream LLM materializes these skills on new code surfaces under the same compile/correctness/profile gate. On five expert workloads across two NVIDIA architectures, these lineage-derived skills serve as an effective optimization curriculum, exceeding recent memory-based LLM-kernel baselines in both final kernel quality and optimization efficiency under the same fixed budget. We additionally use a separate 22-instance held-out check as a sanity test against source-case memorization.
LGFeb 21, 2024Code
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language ModelsChenyang Song, Xu Han, Zhengyan Zhang et al.
Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs. As a prevalent property of the models using the ReLU activation function, activation sparsity has been proven a promising paradigm to boost model inference efficiency. Nevertheless, most large language models (LLMs) adopt activation functions without intrinsic activation sparsity (e.g., GELU and Swish). Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance. This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity while maintaining comparable performance. Specifically, after substituting the activation function of LLMs with ReLU, ProSparse adopts progressive sparsity regularization with a factor smoothly increasing along the multi-stage sine curves. This can enhance activation sparsity and mitigate performance degradation by avoiding radical shifts in activation distributions. With ProSparse, we obtain high sparsity of 89.32% for LLaMA2-7B, 88.80% for LLaMA2-13B, and 87.89% for end-size MiniCPM-1B, respectively, achieving comparable performance to their original Swish-activated versions. These present the most sparsely activated models among open-source LLaMA versions and competitive end-size models, considerably surpassing ReluLLaMA-7B (66.98%) and ReluLLaMA-13B (71.56%). Our inference acceleration experiments further demonstrate the significant practical acceleration potential of LLMs with higher activation sparsity, obtaining up to 4.52$\times$ inference speedup.
65.2CRMay 6
A Novel Byte-Level Flow-to-Image Encoding Method for Network Intrusion Detection SystemsZiyu Mu, Zihui Yan, Xiyu Shi et al.
Network-based Intrusion Detection Systems (IDS) are predominantly trained on tabular flow records, whose one-dimensional representations limit convolutional architectures from exploiting inter-feature spatial correlations. This paper presents a novel byte-level flow-to-image encoding method that converts each network-flow record into a fixed-size RGB image. Continuous features are serialised using IEEE-754 single-precision format and packed sequentially into pixels along an inverted-L shaped trajectory, while discrete features are mapped to byte values and placed contiguously in the middle image row's centre. The encoding is deterministic and reversible, preserving a fixed spatial layout across all samples. Four IDS models are evaluated on NSL-KDD and UNSW-NB15 datasets with both flow and image-based configurations. The image-based representation yields consistent accuracy gains of up to 15.6\% and 12.8\% for binary and multi-classification on UNSW-NB15, and up to 3.5\% and 3.2\% on NSL-KDD, highlighting the potential of byte-level visual encoding to strengthen AI-driven intrusion detection in local computer networks.
19.8CRMar 30
GMA-SAWGAN-GP: A Novel Data Generative Framework to Enhance IDS Detection PerformanceZiyu Mu, Xiyu Shi, Safak Dogan
Intrusion Detection System (IDS) is often calibrated to known attacks and generalizes poorly to unknown threats. This paper proposes GMA-SAWGAN-GP, a novel generative augmentation framework built on a Self-Attention-enhanced Wasserstein GAN with Gradient Penalty (WGAN-GP). The generator employs Gumbel-Softmax regularization to model discrete fields, while a Multilayer Perceptron (MLP)-based AutoEncoder acts as a manifold regularizer. A lightweight gating network adaptively balances adversarial and reconstruction losses via entropy regularization, improving stability and mitigating mode collapse. The self-attention mechanism enables the generator to capture both short- and long-range dependencies among features within each record while preserving categorical semantics through Gumbel-Softmax heads. Extensive experiments on NSL-KDD, UNSW-NB15, and CICIDS2017 using five representative IDS models demonstrate that GMA-SAWGAN-GP significantly improves detection performance on known attacks and enhances generalization to unknown attacks. Leave-One-Attack-type-Out (LOAO) evaluations using Area Under the Receiver Operating Characteristic (AUROC) and True Positive Rate at a 5 percent False Positive Rate confirm that IDS models trained on augmented datasets achieve higher robustness under unseen attack scenarios. Ablation studies validate the contribution of each component to performance gains. Compared with baseline models, the proposed framework improves binary classification accuracy by an average of 5.3 percent and multi-classification accuracy by 2.2 percent, while AUROC and True Positive Rate at a 5 percent False Positive Rate for unknown attacks increase by 3.9 percent and 4.8 percent, respectively, across the three datasets. Overall, GMA-SAWGAN-GP provides an effective approach to generative augmentation for mixed-type network traffic, improving IDS accuracy and resilience.
27.3CRMar 19
A Novel Solution for Zero-Day Attack Detection in IDS using Self-Attention and Jensen-Shannon Divergence in WGAN-GPZiyu Mu, Xiyu Shi, Safak Dogan
The increasing sophistication of cyber threats, especially zero-day attacks, poses a significant challenge to cybersecurity. Zero-day attacks exploit unknown vulnerabilities, making them difficult to detect and defend against. Existing approaches patch flaws and deploy an Intrusion Detection System (IDS). Using advanced Wasserstein GANs with Gradient Penalty (WGAN-GP), this paper makes a novel proposition to synthesize network traffic that mimics zero-day patterns, enriching data diversity and improving IDS generalization. SA-WGAN-GP is first introduced, which adds a Self-Attention (SA) mechanism to capture long-range cross-feature dependencies by reshaping the feature vector into tokens after dense projections. A JS-WGAN-GP is then proposed, which adds a Jensen-Shannon (JS) divergence-based auxiliary discriminator that is trained with Binary Cross-Entropy (BCE), frozen during updates, and used to regularize the generator for smoother gradients and higher sample quality. Third, SA-JS-WGAN-GP is created by combining the SA mechanism with JS divergence, thereby enhancing the data generation ability of WGAN-GP. As data augmentation does not equate with true zero-day attack discovery, we emulate zero-day attacks via the leave-one-attack-type-out method on the NSL-KDD dataset for training all GANs and IDS models in the assessment of the effectiveness of the proposed solution. The evaluation results show that integrating SA and JS divergence into WGAN-GP yields superior IDS performance and more effective zero-day risk detection.
CRApr 4, 2020
Scalar Product Lattice Computation for Efficient Privacy-preserving SystemsYogachandran Rahulamathavan, Safak Dogan, Xiyu Shi et al.
Privacy-preserving applications allow users to perform on-line daily actions without leaking sensitive information. Privacy-preserving scalar product is one of the critical algorithms in many private applications. The state-of-the-art privacy-preserving scalar product schemes use either computationally intensive homomorphic (public-key) encryption techniques such as Paillier encryption to achieve strong security (i.e., 128-bit) or random masking technique to achieve high efficiency for low security. In this paper, lattice structures have been exploited to develop an efficient privacy-preserving system. The proposed scheme is not only efficient in computation as compared to the state-of-the-art but also provides high degree of security against quantum attacks. Rigorous security and privacy analyses of the proposed scheme have been provided along with a concrete set of parameters to achieve 128-bit and 256-bit security. Performance analysis shows that the scheme is at least five orders faster than the Paillier schemes and at least twice as faster than the existing randomisation technique at 128-bit security.