CVJun 28, 2023Code
Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image ClassificationPei Liu, Luping Ji, Xinyu Zhang et al.
Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL models from adequate and efficient training, suppressing the continuous performance promotion of classification models on WSIs. Inspired by the basic idea of Mixup, this paper proposes a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models. This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by pseudo-bags, our PseMix fulfills the critical size alignment and semantic alignment in Mixup strategy. Moreover, it is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions. Comparative experiments and ablation studies are specially designed to evaluate the effectiveness and advantages of our PseMix. Experimental results show that PseMix could often assist state-of-the-art MIL networks to refresh their classification performance on WSIs. Besides, it could also boost the generalization performance of MIL models in special test scenarios, and promote their robustness to patch occlusion and label noise. Our source code is available at https://github.com/liupei101/PseMix.
CLOct 26, 2022
Multilevel Transformer For Multimodal Emotion RecognitionJunyi He, Meimei Wu, Meng Li et al.
Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion expression, it is reasonable to take these two aspects into consideration. Unlike previous methods that mainly focus on one aspect, we introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation. Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition. Specifically, we explore different methods to incorporate phoneme-level embedding with word-level embedding. To perform multi-granularity learning, we simply combine multilevel transformer model with Albert. Extensive experimental results show that both our multilevel transformer model and multi-granularity model outperform previous state-of-the-art approaches on IEMOCAP dataset with text transcripts and speech signal.
CVJan 15, 2023
Learning to Compress Unmanned Aerial Vehicle (UAV) Captured Video: Benchmark and AnalysisChuanmin Jia, Feng Ye, Huifang Sun et al.
During the past decade, the Unmanned-Aerial-Vehicles (UAVs) have attracted increasing attention due to their flexible, extensive, and dynamic space-sensing capabilities. The volume of video captured by UAVs is exponentially growing along with the increased bitrate generated by the advancement of the sensors mounted on UAVs, bringing new challenges for on-device UAV storage and air-ground data transmission. Most existing video compression schemes were designed for natural scenes without consideration of specific texture and view characteristics of UAV videos. In this work, we first contribute a detailed analysis of the current state of the field of UAV video coding. Then we propose to establish a novel task for learned UAV video coding and construct a comprehensive and systematic benchmark for such a task, present a thorough review of high quality UAV video datasets and benchmarks, and contribute extensive rate-distortion efficiency comparison of learned and conventional codecs after. Finally, we discuss the challenges of encoding UAV videos. It is expected that the benchmark will accelerate the research and development in video coding on drone platforms.
IVDec 13, 2022
AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide ImagesPei Liu, Luping Ji, Feng Ye et al.
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a completely-certain point estimation of time-to-event, and they could only learn from the labeled WSI data currently at a small scale. To tackle these problems, we propose a novel adversarial multiple instance learning (AdvMIL) framework. This framework is based on adversarial time-to-event modeling, and integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning. It is a plug-and-play one, so that most existing MIL-based end-to-end methods can be easily upgraded by applying this framework, gaining the improved abilities of survival distribution estimation and semi-supervised learning. Our extensive experiments show that AdvMIL not only could often bring performance improvement to mainstream WSI survival analysis methods at a relatively low computational cost, but also enables these methods to effectively utilize unlabeled data via semi-supervised learning. Moreover, it is observed that AdvMIL could help improving the robustness of models against patch occlusion and two representative image noises. The proposed AdvMIL framework could promote the research of survival analysis in computational pathology with its novel adversarial MIL paradigm.
IVJun 12, 2022
DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer PrognosisPei Liu, Bo Fu, Feng Ye et al.
The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these problems, this paper proposes to efficiently exploit WSI pyramids from a new perspective, the dual-stream network with cross-attention (DSCA). Our key idea is to utilize two sub-streams to process the WSI patches with two resolutions, where a square pooling is devised in a high-resolution stream to significantly reduce computational costs, and a cross-attention-based method is proposed to properly handle the fusion of dual-stream features. We validate our DSCA on three publicly-available datasets with a total number of 3,101 WSIs from 1,911 patients. Our experiments and ablation studies verify that (i) the proposed DSCA could outperform existing state-of-the-art methods in cancer prognosis, by an average C-Index improvement of around 4.6%; (ii) our DSCA network is more efficient in computation -- it has more learnable parameters (6.31M vs. 860.18K) but less computational costs (2.51G vs. 4.94G), compared to a typical existing multi-resolution network. (iii) the key components of DSCA, dual-stream and cross-attention, indeed contribute to our model's performance, gaining an average C-Index rise of around 2.0% while maintaining a relatively-small computational load. Our DSCA could serve as an alternative and effective tool for WSI-based cancer prognosis.
NIMay 15
TG-DIN: Theory-Guided Demand Inference Network for Generalizable QoS Measurement and PredictionFuliang Yang, Feng Ye
In this paper, we introduce TG-DIN, a theory-guided demand inference network that infers latent user demand from observable network quality-of-service (QoS) measurements. Rather than directly predicting QoS outcomes using black-box models, TG-DIN explicitly models latent demand as an intermediate variable and links it to observable behavior through a differentiable theory layer grounded in scheduling and queuing principles. This design yields an interpretable, mechanism-consistent representation of user demand that is directly applicable to downstream tasks such as congestion diagnosis, resource allocation, capacity planning, and policy evaluation. The theory layer further enables a principled randomized training regime that exposes the model to diverse yet physically meaningful operating conditions without requiring labeled demand data. Extensive synthetic experiments show that TG-DIN generalizes robustly across capacities, demand levels, and traffic patterns, substantially outperforming purely data-driven baselines under distribution shift. Moreover, when trained exclusively on synthetic data and applied directly to real packet traces, TG-DIN accurately recovers per-user allocation structure in shared-link scenarios. Together, these results demonstrate the effectiveness of theory-guided inductive biases for achieving transferable, deployment-ready inference in dynamic network environments.
CVNov 12, 2025
Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT BlocksCheng Wang, Shuisheng Zhou, Fengjiao Peng et al.
In the field of image clustering, the widely used contrastive learning networks improve clustering performance by maximizing the similarity between positive pairs and the dissimilarity of negative pairs of the inputs. Extant contrastive learning networks, whose two encoders often implicitly interact with each other by parameter sharing or momentum updating, may not fully exploit the complementarity and similarity of the positive pairs to extract clustering features from input data. To explicitly fuse the learned features of positive pairs, we design a novel multiple fusing-augmenting ViT blocks (MFAVBs) based on the excellent feature learning ability of Vision Transformers (ViT). Firstly, two preprocessed augmentions as positive pairs are separately fed into two shared-weight ViTs, then their output features are fused to input into a larger ViT. Secondly, the learned features are split into a pair of new augmented positive samples and passed to the next FAVBs, enabling multiple fusion and augmention through MFAVBs operations. Finally, the learned features are projected into both instance-level and clustering-level spaces to calculate the cross-entropy loss, followed by parameter updates by backpropagation to finalize the training process. To further enhance ability of the model to distinguish between similar images, our input data for the network we propose is preprocessed augmentions with features extracted from the CLIP pretrained model. Our experiments on seven public datasets demonstrate that MFAVBs serving as the backbone for contrastive clustering outperforms the state-of-the-art techniques in terms of clustering performance.
CVJan 21, 2021Code
A Person Re-identification Data Augmentation Method with Adversarial Defense EffectYunpeng Gong, Zhiyong Zeng, Liwen Chen et al.
The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Grayscale Patch Replacement, it consists of Local Grayscale Patch Replacement(LGPR) and Global Grayscale Patch Replacement(GGPR). This method can not only improve the accuracy of the model, but also help the model defend against adversarial examples; 2) Multi-Modal Defense, it integrates three homogeneous modal images of visible, grayscale and sketch, and further strengthens the defense ability of the model. These methods fuse different modalities of homogeneous images to enrich the input sample variety, the variaty of samples will reduce the over-fitting of the ReID model to color variations and make the adversarial space of the dataset that the attack method can find difficult to align, thus the accuracy of model is improved, and the attack effect is greatly reduced. The more modal homogeneous images are fused, the stronger the defense capabilities is . The proposed method performs well on multiple datasets, and successfully defends the attack of MS-SSIM proposed by CVPR2020 against ReID [10], and increases the accuracy by 467 times(0.2% to 93.3%).The code is available at https://github.com/finger-monkey/ReID_Adversarial_Defense.
NIMar 17
Fine-Grained Network Traffic Classification with Contextual QoS ProfilingHuiwen Zhang, Feng Ye
Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. While recent advances in application-level classification show high accuracy, they often miss fine-grained in-app QoS variations critical for service differentiation. This paper proposes a hierarchical graph neural network (GNN) framework that combines a three-level graph representation with an automated QoS-aware assignment algorithm. The model captures multi-scale temporal patterns via packet aggregation, time-window clustering, and session-level behavior modeling. QoS priorities are derived using five key metrics (bandwidth, jitter, packet stability, burst frequency, and burst stability), processed through logarithmic transformation and weighted ranking. Evaluations across 14 usage scenarios from YouTube, Prime Video, TikTok, and Zoom show that the proposed GNN significantly outperforms state-of-the-art methods in service-level classification. The QoS-aware assignment further refines classification to enhance user experience. This work advances QoS-aware traffic classification by enabling precise in-app usage differentiation and adaptive service prioritization in dynamic network environments.
NIMar 12
Radio Radiance Field: The New Frontier of Spatial Wireless Channel RepresentationHaijian Sun, Feng Ye
Massive MIMO, among other ground-breaking technologies, is being developed for the next-generation wireless systems to support requirements in terms of data rates, reliability, latency, intelligence, security and energy efficiency. Accurate channel estimation remains a key challenge in fully exploiting massive MIMO. While recent research has explored aspects such as near-field effects, spatial non-stationarity, and channel sparsity, many practical estimation and modeling techniques still provide limited CSI, often dominated by aggregate channel gain and delay, without full spatial characteristics. Although wideband models and phased-array techniques can capture delay and angular information, many practical estimation methods still lack comprehensive spatial resolution, including polarization, which limits their effectiveness for advanced massive MIMO techniques. This article introduces the concept of radio radiance field (RRF), which captures the spatial distribution and directionality of radio propagation. From RRF, a comprehensive spatial representation of the wireless channel, referred to as Spatial-CSI, can be derived. Owing to the comprehensive geometric and radio information, RRF can be implemented directly for beamforming, delay-alignment modulation, and many other techniques in massive MIMO and reflective intelligent surface implementations. An RRF can also serve as a digital radio twin, which is a virtual representation of the radio environment that includes both geometric structure and radio propagation characteristics, enabling real-time simulation and optimization of wireless systems. It paves the way for various applications from communications to sensing in the next-generation wireless communication systems.
LGFeb 13
Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field ReconstructionHuiwen Zhang, Feng Ye, Chu Ma
Large-scale wave field reconstruction requires precise solutions but faces challenges with computational efficiency and accuracy. The physics-based numerical methods like Finite Element Method (FEM) provide high accuracy but struggle with large-scale or high-frequency problems due to prohibitive computational costs. Pure data-driven approaches excel in speed but often lack sufficient labeled data for complex scenarios. Physics-informed neural networks (PINNs) integrate physical principles into machine learning models, offering a promising solution by bridging these gaps. However, standard PINNs embed physical principles only in loss functions, leading to slow convergence, optimization instability, and spectral bias, limiting their ability for large-scale wave field reconstruction. This work introduces architecture physics embedded (PE)-PINN, which integrates additional physical guidance directly into the neural network architecture beyond Helmholtz equations and boundary conditions in loss functions. Specifically, a new envelope transformation layer is designed to mitigate spectral bias with kernels parameterized by source properties, material interfaces, and wave physics. Experiments demonstrate that PE-PINN achieves more than 10 times speedup in convergence compared to standard PINNs and several orders of magnitude reduction in memory usage compared to FEM. This breakthrough enables high-fidelity modeling for large-scale 2D/3D electromagnetic wave reconstruction involving reflections, refractions, and diffractions in room-scale domains, readily applicable to wireless communications, sensing, room acoustics, and other fields requiring large-scale wave field analysis.
LGOct 26, 2024
DeepMIDE: A Multi-Output Spatio-Temporal Method for Ultra-Scale Offshore Wind Energy ForecastingFeng Ye, Xinxi Zhang, Michael Stein et al.
To unlock access to stronger winds, the offshore wind industry is advancing towards significantly larger and taller wind turbines. This massive upscaling motivates a departure from wind forecasting methods that traditionally focused on a single representative height. To fill this gap, we propose DeepMIDE--a statistical deep learning method which jointly models the offshore wind speeds across space, time, and height. DeepMIDE is formulated as a multi-output integro-difference equation model with a multivariate nonstationary kernel characterized by a set of advection vectors that encode the physics of wind field formation and propagation. Embedded within DeepMIDE, an advanced deep learning architecture learns these advection vectors from high-dimensional streams of exogenous weather information, which, along with other parameters, are plugged back into the statistical model for probabilistic multi-height space-time forecasting. Tested on real-world data from offshore wind energy areas in the Northeastern United States, the wind speed and power forecasts from DeepMIDE are shown to outperform those from prevalent time series, spatio-temporal, and deep learning methods.
LGNov 21, 2025
Enhancing Adversarial Transferability through Block Stretch and ShrinkQuan Liu, Feng Ye, Chenhao Lu et al.
Adversarial attacks introduce small, deliberately crafted perturbations that mislead neural networks, and their transferability from white-box to black-box target models remains a critical research focus. Input transformation-based attacks are a subfield of adversarial attacks that enhance input diversity through input transformations to improve the transferability of adversarial examples. However, existing input transformation-based attacks tend to exhibit limited cross-model transferability. Previous studies have shown that high transferability is associated with diverse attention heatmaps and the preservation of global semantics in transformed inputs. Motivated by this observation, we propose Block Stretch and Shrink (BSS), a method that divides an image into blocks and applies stretch and shrink operations to these blocks, thereby diversifying attention heatmaps in transformed inputs while maintaining their global semantics. Empirical evaluations on a subset of ImageNet demonstrate that BSS outperforms existing input transformation-based attack methods in terms of transferability. Furthermore, we examine the impact of the number scale, defined as the number of transformed inputs, in input transformation-based attacks, and advocate evaluating these methods under a unified number scale to enable fair and comparable assessments.
SPMay 6, 2025
Terahertz Spatial Wireless Channel Modeling with Radio Radiance FieldJohn Song, Lihao Zhang, Feng Ye et al.
Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency bands due to severe free space path loss, minimal diffraction and specular reflection, and prominent scattering, making conventional channel modeling and pilot-based estimation approaches inefficient. In this work, we investigate the feasibility of applying radio radiance field (RRF) framework to the THz band. This method reconstructs a continuous RRF using visual-based geometry and sparse THz RF measurements, enabling efficient spatial channel state information (Spatial-CSI) modeling without dense sampling. We first build a fine simulated THz scenario, then we reconstruct the RRF and evaluate the performance in terms of both reconstruction quality and effectiveness in THz communication, showing that the reconstructed RRF captures key propagation paths with sparse training samples. Our findings demonstrate that RRF modeling remains effective in the THz regime and provides a promising direction for scalable, low-cost spatial channel reconstruction in future 6G networks.
SDJan 19, 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcriptionDabiao Ma, Yitong Zhang, Meng Li et al.
Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice.
LGAug 19, 2021
Fast Newton method solving KLR based on Multilevel Circulant Matrix with log-linear complexityJunna Zhang, Shuisheng Zhou, Cui Fu et al.
Kernel logistic regression (KLR) is a conventional nonlinear classifier in machine learning. With the explosive growth of data size, the storage and computation of large dense kernel matrices is a major challenge in scaling KLR. Even the nyström approximation is applied to solve KLR, it also faces the time complexity of $O(nc^2)$ and the space complexity of $O(nc)$, where $n$ is the number of training instances and $c$ is the sampling size. In this paper, we propose a fast Newton method efficiently solving large-scale KLR problems by exploiting the storage and computing advantages of multilevel circulant matrix (MCM). Specifically, by approximating the kernel matrix with an MCM, the storage space is reduced to $O(n)$, and further approximating the coefficient matrix of the Newton equation as MCM, the computational complexity of Newton iteration is reduced to $O(n \log n)$. The proposed method can run in log-linear time complexity per iteration, because the multiplication of MCM (or its inverse) and vector can be implemented the multidimensional fast Fourier transform (mFFT). Experimental results on some large-scale binary-classification and multi-classification problems show that the proposed method enables KLR to scale to large scale problems with less memory consumption and less training time without sacrificing test accuracy.
CRMar 9, 2021
ByteSGAN: A Semi-supervised Generative Adversarial Network for Encrypted Traffic Classification of SDN Edge Gateway in Green Communication NetworkPan Wang, Zixuan Wang, Feng Ye et al.
With the rapid development of Green Communication Network, the types and quantity of network traffic data are accordingly increasing. Network traffic classification become a non-trivial research task in the area of network management and security, which not only help to improve the fine-grained network resource allocation, but also enable policy-driven network management. Meanwhile, the combination of SDN and Edge Computing can leverage both SDN at its global visiability of network-wide and Edge Computing at its low latency and good privacy-preserving. However, capturing large labeled datasets is a cumbersome and time-consuming manual labor. Semi-Supervised learning is an appropriate technique to overcome this problem. With that in mind, we proposed a Generative Adversarial Network (GAN)-based Semi-Supervised Learning Encrypted Traffic Classification method called \emph{ByteSGAN} embedded in SDN Edge Gateway to achieve the goal of traffic classification in a fine-grained manner to further improve network resource utilization. ByteSGAN can only use a small number of labeled traffic samples and a large number of unlabeled samples to achieve a good performance of traffic classification by modifying the structure and loss function of the regular GAN discriminator network in a semi-supervised learning way. Based on public dataset 'ISCX2012 VPN-nonVPN', two experimental results show that the ByteSGAN can efficiently improve the performance of traffic classifier and outperform the other supervised learning method like CNN.
LGDec 7, 2020
Efficient and Scalable Structure Learning for Bayesian Networks: Algorithms and ApplicationsRong Zhu, Andreas Pfadler, Ziniu Wu et al.
Structure Learning for Bayesian network (BN) is an important problem with extensive research. It plays central roles in a wide variety of applications in Alibaba Group. However, existing structure learning algorithms suffer from considerable limitations in real world applications due to their low efficiency and poor scalability. To resolve this, we propose a new structure learning algorithm LEAST, which comprehensively fulfills our business requirements as it attains high accuracy, efficiency and scalability at the same time. The core idea of LEAST is to formulate the structure learning into a continuous constrained optimization problem, with a novel differentiable constraint function measuring the acyclicity of the resulting graph. Unlike with existing work, our constraint function is built on the spectral radius of the graph and could be evaluated in near linear time w.r.t. the graph node size. Based on it, LEAST can be efficiently implemented with low storage overhead. According to our benchmark evaluation, LEAST runs 1 to 2 orders of magnitude faster than state of the art method with comparable accuracy, and it is able to scale on BNs with up to hundreds of thousands of variables. In our production environment, LEAST is deployed and serves for more than 20 applications with thousands of executions per day. We describe a concrete scenario in a ticket booking service in Alibaba, where LEAST is applied to build a near real-time automatic anomaly detection and root error cause analysis system. We also show that LEAST unlocks the possibility of applying BN structure learning in new areas, such as large-scale gene expression data analysis and explainable recommendation system.
CRNov 27, 2019
PacketCGAN: Exploratory Study of Class Imbalance for Encrypted Traffic Classification Using CGANPan Wang, Shuhang Li, Feng Ye et al.
With more and more adoption of Deep Learning (DL) in the field of image processing, computer vision and NLP, researchers have begun to apply DL to tackle with encrypted traffic classification problems. Although these methods can automatically extract traffic features to overcome the difficulty of traditional classification methods like DPI in terms of feature engineering, a large amount of data is needed to learn the characteristics of various types of traffic. Therefore, the performance of classification model always significantly depends on the quality of datasets. Nevertheless, the building of datasets is a time-consuming and costly task, especially encrypted traffic data. Apparently, it is often more difficult to collect a large amount of traffic samples of those unpopular encrypted applications than well-known, which leads to the problem of class imbalance between major and minor encrypted applications in datasets. In this paper, we proposed a novel traffic data augmenting method called PacketCGAN using Conditional GAN. As a generative model, PacketCGAN exploit the benefit of CGAN to generate specified traffic to address the problem of the datasets' imbalance. As a proof of concept, three classical DL models like Convolutional Neural Network (CNN) were adopted and designed to classify four encrypted traffic datasets augmented by Random Over Sampling (ROS), SMOTE(Synthetic Minority Over-sampling Techinique) , vanilla GAN and PacketCGAN respectively based on two public datasets: ISCX2012 and USTC-TFC2016. The experimental evaluation results demonstrate that DL based encrypted traffic classifier over dataset augmented by PacketCGAN can achieve better performance than the others.
CRJun 18, 2018
A Hierarchical Approach to Encrypted Data Packet Classification in Smart Home GatewaysXuejiao Chen, Jiahui Yu, Feng Ye et al.
With the pervasive network based services in smart homes, traditional network management cannot guarantee end-user quality-of-experience (QoE) for all applications. End-user QoE must be supported by efficient network quality-of-service (QoS) measurement and efficient network resource allocation. With the software-defined network technology, the core network may be controlled more efficiently by a network service provider. However, end-to-end network QoS can hardly be improved the managing the core network only. In this paper, we propose an encrypted packet classification scheme for smart home gateways to improve end-to-end QoS measurement from the network operator side. Furthermore, other services such as statistical data collecting, billing to service providers, etc., can be provided without compromising end-user privacy nor security of a network. The proposed encrypted packet classification scheme has a two-level hierarchical structure. One is the service level, which is based on applications that have the same network QoS requirements. A faster classification scheme based on deep learning is proposed to achieve real-time classification with high accuracy. The other one is the application level, which is based on fine-grained applications. A non-real-time classifier can be applied to provide high accuracy. Evaluation is conducted on both level classifiers to demonstrate the efficiency and accuracy of the two types of classifiers.