Yujun Zhang

LG
h-index14
13papers
345citations
Novelty57%
AI Score55

13 Papers

NIApr 19, 2023
NetGPT: Generative Pretrained Transformer for Network Traffic

Xuying Meng, Chungang Lin, Yequan Wang et al.

All data on the Internet are transferred by network traffic, thus accurately modeling network traffic can help improve network services quality and protect data privacy. Pretrained models for network traffic can utilize large-scale raw data to learn the essential characteristics of network traffic, and generate distinguishable results for input traffic without considering specific downstream tasks. Effective pretrained models can significantly optimize the training efficiency and effectiveness of downstream tasks, such as application classification, attack detection and traffic generation. Despite the great success of pretraining in natural language processing, there is no work in the network field. Considering the diverse demands and characteristics of network traffic and network tasks, it is non-trivial to build a pretrained model for network traffic and we face various challenges, especially the heterogeneous headers and payloads in the multi-pattern network traffic and the different dependencies for contexts of diverse downstream network tasks. To tackle these challenges, in this paper, we make the first attempt to provide a generative pretrained model NetGPT for both traffic understanding and generation tasks. We propose the multi-pattern network traffic modeling to construct unified text inputs and support both traffic understanding and generation tasks. We further optimize the adaptation effect of the pretrained model to diversified tasks by shuffling header fields, segmenting packets in flows, and incorporating diverse task labels with prompts. With diverse traffic datasets from encrypted software, DNS, private industrial protocols and cryptocurrency mining, expensive experiments demonstrate the effectiveness of our NetGPT in a range of traffic understanding and generation tasks on traffic datasets, and outperform state-of-the-art baselines by a wide margin.

LGAug 15, 2025Code
Generalize across Homophily and Heterophily: Hybrid Spectral Graph Pre-Training and Prompt Tuning

Haitong Luo, Suhang Wang, Weiyao Zhang et al.

Graph ``pre-training and prompt-tuning'' aligns downstream tasks with pre-trained objectives to enable efficient knowledge transfer under limited supervision. However, existing methods rely on homophily-based low-frequency knowledge, failing to handle diverse spectral distributions in real-world graphs with varying homophily. Our theoretical analysis reveals a spectral specificity principle: optimal knowledge transfer requires alignment between pre-trained spectral filters and the intrinsic spectrum of downstream graphs. Under limited supervision, large spectral gaps between pre-training and downstream tasks impede effective adaptation. To bridge this gap, we propose the HS-GPPT model, a novel framework that ensures spectral alignment throughout both pre-training and prompt-tuning. We utilize a hybrid spectral filter backbone and local-global contrastive learning to acquire abundant spectral knowledge. Then we design prompt graphs to align the spectral distribution with pretexts, facilitating spectral knowledge transfer across homophily and heterophily. Extensive experiments validate the effectiveness under both transductive and inductive learning settings. Our code is available at https://anonymous.4open.science/r/HS-GPPT-62D2/.

CVAug 19, 2021Code
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

Yujun Zhang, Lei Zhu, Wei Feng et al.

Lane detection plays a key role in autonomous driving. While car cameras always take streaming videos on the way, current lane detection works mainly focus on individual images (frames) by ignoring dynamics along the video. In this work, we collect a new video instance lane detection (VIL-100) dataset, which contains 100 videos with in total 10,000 frames, acquired from different real traffic scenarios. All the frames in each video are manually annotated to a high-quality instance-level lane annotation, and a set of frame-level and video-level metrics are included for quantitative performance evaluation. Moreover, we propose a new baseline model, named multi-level memory aggregation network (MMA-Net), for video instance lane detection. In our approach, the representation of current frame is enhanced by attentively aggregating both local and global memory features from other frames. Experiments on the new collected dataset show that the proposed MMA-Net outperforms state-of-the-art lane detection methods and video object segmentation methods. We release our dataset and code at https://github.com/yujun0-0/MMA-Net.

CVMar 11, 2018Code
BTS-DSN: Deeply Supervised Neural Network with Short Connections for Retinal Vessel Segmentation

Song Guo, Kai Wang, Hong Kang et al.

Background and Objective: The condition of vessel of the human eye is an important factor for the diagnosis of ophthalmological diseases. Vessel segmentation in fundus images is a challenging task due to complex vessel structure, the presence of similar structures such as microaneurysms and hemorrhages, micro-vessel with only one to several pixels wide, and requirements for finer results. Methods:In this paper, we present a multi-scale deeply supervised network with short connections (BTS-DSN) for vessel segmentation. We used short connections to transfer semantic information between side-output layers. Bottom-top short connections pass low level semantic information to high level for refining results in high-level side-outputs, and top-bottom short connection passes much structural information to low level for reducing noises in low-level side-outputs. In addition, we employ cross-training to show that our model is suitable for real world fundus images. Results: The proposed BTS-DSN has been verified on DRIVE, STARE and CHASE_DB1 datasets, and showed competitive performance over other state-of-the-art methods. Specially, with patch level input, the network achieved 0.7891/0.8212 sensitivity, 0.9804/0.9843 specificity, 0.9806/0.9859 AUC, and 0.8249/0.8421 F1-score on DRIVE and STARE, respectively. Moreover, our model behaves better than other methods in cross-training experiments. Conclusions: BTS-DSN achieves competitive performance in vessel segmentation task on three public datasets. It is suitable for vessel segmentation. The source code of our method is available at https://github.com/guomugong/BTS-DSN.

LGJun 30, 2025
MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic

Yujun Zhang, Runlong Li, Xiaoxiang Liang et al.

The abnormal fluctuations in network traffic may indicate potential security threats or system failures. Therefore, efficient network traffic prediction and anomaly detection methods are crucial for network security and traffic management. This paper proposes a novel network traffic prediction and anomaly detection model, MamNet, which integrates time-domain modeling and frequency-domain feature extraction. The model first captures the long-term dependencies of network traffic through the Mamba module (time-domain modeling), and then identifies periodic fluctuations in the traffic using Fourier Transform (frequency-domain feature extraction). In the feature fusion layer, multi-scale information is integrated to enhance the model's ability to detect network traffic anomalies. Experiments conducted on the UNSW-NB15 and CAIDA datasets demonstrate that MamNet outperforms several recent mainstream models in terms of accuracy, recall, and F1-Score. Specifically, it achieves an improvement of approximately 2% to 4% in detection performance for complex traffic patterns and long-term trend detection. The results indicate that MamNet effectively captures anomalies in network traffic across different time scales and is suitable for anomaly detection tasks in network security and traffic management. Future work could further optimize the model structure by incorporating external network event information, thereby improving the model's adaptability and stability in complex network environments.

CLOct 15, 2024
Enhance Graph Alignment for Large Language Models

Haitong Luo, Xuying Meng, Suhang Wang et al.

Graph-structured data is prevalent in the real world. Recently, due to the powerful emergent capabilities, Large Language Models (LLMs) have shown promising performance in modeling graphs. The key to effectively applying LLMs on graphs is converting graph data into a format LLMs can comprehend. Graph-to-token approaches are popular in enabling LLMs to process graph information. They transform graphs into sequences of tokens and align them with text tokens through instruction tuning, where self-supervised instruction tuning helps LLMs acquire general knowledge about graphs, and supervised fine-tuning specializes LLMs for the downstream tasks on graphs. Despite their initial success, we find that existing methods have a misalignment between self-supervised tasks and supervised downstream tasks, resulting in negative transfer from self-supervised fine-tuning to downstream tasks. To address these issues, we propose Graph Alignment Large Language Models (GALLM) to benefit from aligned task templates. In the self-supervised tuning stage, we introduce a novel text matching task using templates aligned with downstream tasks. In the task-specific tuning stage, we propose two category prompt methods that learn supervision information from additional explanation with further aligned templates. Experimental evaluations on four datasets demonstrate substantial improvements in supervised learning, multi-dataset generalizability, and particularly in zero-shot capability, highlighting the model's potential as a graph foundation model.

NIAug 4, 2025
Convolutions are Competitive with Transformers for Encrypted Traffic Classification with Pre-training

Chungang Lin, Weiyao Zhang, Tianyu Zuo et al.

Encrypted traffic classification is vital for modern network management and security. To reduce reliance on handcrafted features and labeled data, recent methods focus on learning generic representations through pre-training on large-scale unlabeled data. However, current pre-trained models face two limitations originating from the adopted Transformer architecture: (1) Limited model efficiency due to the self-attention mechanism with quadratic complexity; (2) Unstable traffic scalability to longer byte sequences, as the explicit positional encodings fail to generalize to input lengths not seen during pre-training. In this paper, we investigate whether convolutions, with linear complexity and implicit positional encoding, are competitive with Transformers in encrypted traffic classification with pre-training. We first conduct a systematic comparison, and observe that convolutions achieve higher efficiency and scalability, with lower classification performance. To address this trade-off, we propose NetConv, a novel pre-trained convolution model for encrypted traffic classification. NetConv employs stacked traffic convolution layers, which enhance the ability to capture localized byte-sequence patterns through window-wise byte scoring and sequence-wise byte gating. We design a continuous byte masking pre-training task to help NetConv learn protocol-specific patterns. Experimental results on four tasks demonstrate that NetConv improves average classification performance by 6.88% and model throughput by 7.41X over existing pre-trained models.

NEDec 14, 2025
OPAL: Operator-Programmed Algorithms for Landscape-Aware Black-Box Optimization

Junbo Jacob Lian, Mingyang Yu, Kaichen Ouyang et al.

Black-box optimization often relies on evolutionary and swarm algorithms whose performance is highly problem dependent. We view an optimizer as a short program over a small vocabulary of search operators and learn this operator program separately for each problem instance. We instantiate this idea in Operator-Programmed Algorithms (OPAL), a landscape-aware framework for continuous black-box optimization that uses a small design budget with a standard differential evolution baseline to probe the landscape, builds a $k$-nearest neighbor graph over sampled points, and encodes this trajectory with a graph neural network. A meta-learner then maps the resulting representation to a phase-wise schedule of exploration, restart, and local search operators. On the CEC~2017 test suite, a single meta-trained OPAL policy is statistically competitive with state-of-the-art adaptive differential evolution variants and achieves significant improvements over simpler baselines under nonparametric tests. Ablation studies on CEC~2017 justify the choices for the design phase, the trajectory graph, and the operator-program representation, while the meta-components add only modest wall-clock overhead. Overall, the results indicate that operator-programmed, landscape-aware per-instance design is a practical way forward beyond ad hoc metaphor-based algorithms in black-box optimization.

LGSep 30, 2025
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

Xiaobao Wang, Ruoxiao Sun, Yujun Zhang et al.

Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph classification, but remain vulnerable to backdoor attacks that implant imperceptible triggers during training to control predictions. While node-level attacks exploit local message passing, graph-level attacks face the harder challenge of manipulating global representations while maintaining stealth. We identify two main sources of anomaly in existing graph classification backdoor methods: structural deviation from rare subgraph triggers and semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that learns in-distribution triggers via adversarial training guided by anomaly-aware discriminators. DPSBA effectively suppresses both structural and semantic anomalies, achieving high attack success while significantly improving stealth. Extensive experiments on real-world datasets validate that DPSBA achieves a superior balance between effectiveness and detectability compared to state-of-the-art baselines.

CLAug 15, 2025
SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

Haitong Luo, Weiyao Zhang, Suhang Wang et al.

The proliferation of high-quality text from Large Language Models (LLMs) demands reliable and efficient detection methods. While existing training-free approaches show promise, they often rely on surface-level statistics and overlook fundamental signal properties of the text generation process. In this work, we reframe detection as a signal processing problem, introducing a novel paradigm that analyzes the sequence of token log-probabilities in the frequency domain. By systematically analyzing the signal's spectral properties using the global Discrete Fourier Transform (DFT) and the local Short-Time Fourier Transform (STFT), we find that human-written text consistently exhibits significantly higher spectral energy. This higher energy reflects the larger-amplitude fluctuations inherent in human writing compared to the suppressed dynamics of LLM-generated text. Based on this key insight, we construct SpecDetect, a detector built on a single, robust feature from the global DFT: DFT total energy. We also propose an enhanced version, SpecDetect++, which incorporates a sampling discrepancy mechanism to further boost robustness. Extensive experiments demonstrate that our approach outperforms the state-of-the-art model while running in nearly half the time. Our work introduces a new, efficient, and interpretable pathway for LLM-generated text detection, showing that classical signal processing techniques offer a surprisingly powerful solution to this modern challenge.

LGAug 5, 2025
Heterogeneity-Oblivious Robust Federated Learning

Weiyao Zhang, Jinyang Li, Qi Song et al.

Federated Learning (FL) remains highly vulnerable to poisoning attacks, especially under real-world hyper-heterogeneity, where clients differ significantly in data distributions, communication capabilities, and model architectures. Such heterogeneity not only undermines the effectiveness of aggregation strategies but also makes attacks more difficult to detect. Furthermore, high-dimensional models expand the attack surface. To address these challenges, we propose Horus, a heterogeneity-oblivious robust FL framework centered on low-rank adaptations (LoRAs). Rather than aggregating full model parameters, Horus inserts LoRAs into empirically stable layers and aggregates only LoRAs to reduce the attack uncover a key empirical observation that the input projection (LoRA-A) is markedly more stable than the output projection (LoRA-B) under heterogeneity and poisoning. Leveraging this, we design a Heterogeneity-Oblivious Poisoning Score using the features from LoRA-A to filter poisoned clients. For the remaining benign clients, we propose projection-aware aggregation mechanism to preserve collaborative signals while suppressing drifts, which reweights client updates by consistency with the global directions. Extensive experiments across diverse datasets, model architectures, and attacks demonstrate that Horus consistently outperforms state-of-the-art baselines in both robustness and accuracy.

CVJul 26, 2019
Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions

Ruize Han, Yujun Zhang, Wei Feng et al.

Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and human activity recognition. However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views. This is a very challenging problem due to large human-appearance difference between top and horizontal views. In this paper, we present a new approach to address this problem by exploring and matching the subjects' spatial distributions between the two views. More specifically, on the top-view image, we model and match subjects' relative positions to the horizontal-view camera in both views and define a matching cost to decide the actual location of horizontal-view camera and its view angle in the top-view image. We collect a new dataset consisting of top-view and horizontal-view image pairs for performance evaluation and the experimental results show the effectiveness of the proposed method.

CRNov 10, 2015
ELDA: Towards Efficient and Lightweight Detection of Cache Pollution Attacks in NDN

Zhiwei Xu, Bo Chen, Ninghan Wang et al.

As a promising architectural design for future Internet, named data networking (NDN) relies on in-network caching to efficiently deliver name-based content. However, the in-network caching is vulnerable to cache pollution attacks (CPA), which can reduce cache hits by violating cache locality and significantly degrade the overall performance of NDN. To defend against CPA attacks, the most effective way is to first detect the attacks and then throttle them. Since the CPA attack itself has already imposed a huge burden on victims, to avoid exhausting the remaining resources on the victims for detection purpose, we expect a lightweight detection solution. We thus propose ELDA, an Efficient and Lightweight Detection scheme against cache pollution Attacks, in which we design a Lightweight Flajolet-Martin (LFM) sketch to monitor the interest traffic. Our analysis and simulations demonstrate that, by consuming a few computation and memory resources, ELDA can effectively and efficiently detect CPA attacks.