Dan Meng

CR
h-index8
18papers
666citations
Novelty49%
AI Score49

18 Papers

CVAug 15, 2023
AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes

Yunhao Li, Zhen Xiao, Lin Yang et al.

Multi-object tracking (MOT) is a fundamental problem in computer vision with numerous applications, such as intelligent surveillance and automated driving. Despite the significant progress made in MOT, pedestrian attributes, such as gender, hairstyle, body shape, and clothing features, which contain rich and high-level information, have been less explored. To address this gap, we propose a simple, effective, and generic method to predict pedestrian attributes to support general Re-ID embedding. We first introduce AttMOT, a large, highly enriched synthetic dataset for pedestrian tracking, containing over 80k frames and 6 million pedestrian IDs with different time, weather conditions, and scenarios. To the best of our knowledge, AttMOT is the first MOT dataset with semantic attributes. Subsequently, we explore different approaches to fuse Re-ID embedding and pedestrian attributes, including attention mechanisms, which we hope will stimulate the development of attribute-assisted MOT. The proposed method AAM demonstrates its effectiveness and generality on several representative pedestrian multi-object tracking benchmarks, including MOT17 and MOT20, through experiments on the AttMOT dataset. When applied to state-of-the-art trackers, AAM achieves consistent improvements in MOTA, HOTA, AssA, IDs, and IDF1 scores. For instance, on MOT17, the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement when used with FairMOT. To encourage further research on attribute-assisted MOT, we will release the AttMOT dataset.

LGOct 6, 2023
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems

Yuyuan Li, Chaochao Chen, Xiaolin Zheng et al.

With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.

AIMar 5, 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods

Yang Zhang, Hanlei Jin, Dan Meng et al.

Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest in both academic and industrial circles. Many studies have been conducted in the past to survey ATS methods; however, they generally lack practicality for real-world implementations, as they often categorize previous methods from a theoretical standpoint. Moreover, the advent of Large Language Models (LLMs) has altered conventional ATS methods. In this survey, we aim to 1) provide a comprehensive overview of ATS from a ``Process-Oriented Schema'' perspective, which is best aligned with real-world implementations; 2) comprehensively review the latest LLM-based ATS works; and 3) deliver an up-to-date survey of ATS, bridging the two-year gap in the literature. To the best of our knowledge, this is the first survey to specifically investigate LLM-based ATS methods.

39.9CRMay 11
Janus: Compiler-Based Defense Against Transient Execution Attacks Using ARM Hardware Primitives

Ciyan Ouyang, Peinan Li, Yubiao Huang et al.

We present Janus, a compiler-based security framework that mitigates transient execution attacks like Spectre and control-flow hijacking on ARM64 platforms. Janus integrates speculative execution and control flow dependencies with PA modifiers, using PA and BTI microarchitectural features to prevent control-flow speculation attacks and secure both control flow and speculative execution through existing control-flow integrity mechanisms. To optimize performance, Janus minimizes overhead by merging defense operations across different defense layers (modifier fusion) and reusing registers of protected variables (carrier reuse), while maintaining strong security guarantees. Evaluation on SPEC CPU2017 shows an average performance overhead of 3.85%, with real-world applications exhibiting overheads ranging from 2.97% to 7.80%. Janus offers effective speculative execution security and low performance and code size overhead, making it a robust solution for ARM-based systems.

CRMay 6, 2024Code
When LLMs Meet Cybersecurity: A Systematic Literature Review

Jie Zhang, Haoyu Bu, Hui Wen et al.

The rapid development of large language models (LLMs) has opened new avenues across various fields, including cybersecurity, which faces an evolving threat landscape and demand for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper addresses this gap by providing a systematic literature review, covering the analysis of over 300 works, encompassing 25 LLMs and more than 10 downstream scenarios. Our comprehensive overview addresses three key research questions: the construction of cybersecurity-oriented LLMs, the application of LLMs to various cybersecurity tasks, the challenges and further research in this area. This study aims to shed light on the extensive potential of LLMs in enhancing cybersecurity practices and serve as a valuable resource for applying LLMs in this field. We also maintain and regularly update a list of practical guides on LLMs for cybersecurity at https://github.com/tmylla/Awesome-LLM4Cybersecurity.

48.7CRMar 16
vCause: Efficient and Verifiable Causality Analysis for Cloud-based Endpoint Auditing

Qiyang Song, Qihang Zhou, Xiaoqi Jia et al.

In cloud-based endpoint auditing, security administrators often rely on the cloud to perform causality analysis over log-derived versioned provenance graphs to investigate suspicious attack behaviors. However, the cloud may be distrusted or compromised by attackers, potentially manipulating the final causality analysis results. Consequently, administrators may not accurately understand attack behaviors and fail to implement effective countermeasures. This risk underscores the need for a defense scheme to ensure the integrity of causality analysis. While existing tamper-evident logging schemes and trusted execution environments show promise for this task, they are not specifically designed to support causality analysis and thus face inherent security and efficiency limitations. This paper presents vCause, an efficient and verifiable causality analysis system for cloud-based endpoint auditing. vCause integrates two authenticated data structures: a graph accumulator and a verifiable provenance graph. The data structures enable validation of two critical steps in causality analysis: (i) querying a point-of-interest node on a versioned provenance graph, and (ii) identifying its causally related components. Formal security analysis and experimental evaluation show that vCause can achieve secure and verifiable causality analysis with only <1% computational overhead on endpoints and 3.36% on the cloud.

CVApr 14, 2024
LoopAnimate: Loopable Salient Object Animation

Fanyi Wang, Peng Liu, Haotian Hu et al.

Research on diffusion model-based video generation has advanced rapidly. However, limitations in object fidelity and generation length hinder its practical applications. Additionally, specific domains like animated wallpapers require seamless looping, where the first and last frames of the video match seamlessly. To address these challenges, this paper proposes LoopAnimate, a novel method for generating videos with consistent start and end frames. To enhance object fidelity, we introduce a framework that decouples multi-level image appearance and textual semantic information. Building upon an image-to-image diffusion model, our approach incorporates both pixel-level and feature-level information from the input image, injecting image appearance and textual semantic embeddings at different positions of the diffusion model. Existing UNet-based video generation models require to input the entire videos during training to encode temporal and positional information at once. However, due to limitations in GPU memory, the number of frames is typically restricted to 16. To address this, this paper proposes a three-stage training strategy with progressively increasing frame numbers and reducing fine-tuning modules. Additionally, we introduce the Temporal E nhanced Motion Module(TEMM) to extend the capacity for encoding temporal and positional information up to 36 frames. The proposed LoopAnimate, which for the first time extends the single-pass generation length of UNet-based video generation models to 35 frames while maintaining high-quality video generation. Experiments demonstrate that LoopAnimate achieves state-of-the-art performance in both objective metrics, such as fidelity and temporal consistency, and subjective evaluation results.

CRMay 12, 2025
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

Guang Yan, Yuhui Zhang, Zimu Guo et al.

With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating. Secure multi-party computation (MPC) is a promising solution to protect the privacy in LLM inference. However, MPC requires frequent inter-server communication, causing high performance overhead. Inspired by the prevalent activation sparsity of LLMs, where most neuron are not activated after non-linear activation functions, we propose an efficient private inference system, Comet. This system employs an accurate and fast predictor to predict the sparsity distribution of activation function output. Additionally, we introduce a new private inference protocol. It efficiently and securely avoids computations involving zero values by exploiting the spatial locality of the predicted sparse distribution. While this computation-avoidance approach impacts the spatiotemporal continuity of KV cache entries, we address this challenge with a low-communication overhead cache refilling strategy that merges miss requests and incorporates a prefetching mechanism. Finally, we evaluate Comet on four common LLMs and compare it with six state-of-the-art private inference systems. Comet achieves a 1.87x-2.63x speedup and a 1.94x-2.64x communication reduction.

AIAug 12, 2025
Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Yuechen Wang, Yuming Qiao, Dan Meng et al.

Multimodal Retrieval-Augmented Generation (mRAG) has emerged as a promising solution to address the temporal limitations of Multimodal Large Language Models (MLLMs) in real-world scenarios like news analysis and trending topics. However, existing approaches often suffer from rigid retrieval strategies and under-utilization of visual information. To bridge this gap, we propose E-Agent, an agent framework featuring two key innovations: a mRAG planner trained to dynamically orchestrate multimodal tools based on contextual reasoning, and a task executor employing tool-aware execution sequencing to implement optimized mRAG workflows. E-Agent adopts a one-time mRAG planning strategy that enables efficient information retrieval while minimizing redundant tool invocations. To rigorously assess the planning capabilities of mRAG systems, we introduce the Real-World mRAG Planning (RemPlan) benchmark. This novel benchmark contains both retrieval-dependent and retrieval-independent question types, systematically annotated with essential retrieval tools required for each instance. The benchmark's explicit mRAG planning annotations and diverse question design enhance its practical relevance by simulating real-world scenarios requiring dynamic mRAG decisions. Experiments across RemPlan and three established benchmarks demonstrate E-Agent's superiority: 13% accuracy gain over state-of-the-art mRAG methods while reducing redundant searches by 37%.

CVMar 11, 2025
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking

Yunhao Li, Yifan Jiao, Dan Meng et al.

Open-Vocabulary Multi-Object Tracking (OV-MOT) aims to enable approaches to track objects without being limited to a predefined set of categories. Current OV-MOT methods typically rely primarily on instance-level detection and association, often overlooking trajectory information that is unique and essential for object tracking tasks. Utilizing trajectory information can enhance association stability and classification accuracy, especially in cases of occlusion and category ambiguity, thereby improving adaptability to novel classes. Thus motivated, in this paper we propose \textbf{TRACT}, an open-vocabulary tracker that leverages trajectory information to improve both object association and classification in OV-MOT. Specifically, we introduce a \textit{Trajectory Consistency Reinforcement} (\textbf{TCR}) strategy, that benefits tracking performance by improving target identity and category consistency. In addition, we present \textbf{TraCLIP}, a plug-and-play trajectory classification module. It integrates \textit{Trajectory Feature Aggregation} (\textbf{TFA}) and \textit{Trajectory Semantic Enrichment} (\textbf{TSE}) strategies to fully leverage trajectory information from visual and language perspectives for enhancing the classification results. Extensive experiments on OV-TAO show that our TRACT significantly improves tracking performance, highlighting trajectory information as a valuable asset for OV-MOT. Code will be released.

CRApr 20, 2021
DeepHunter: A Graph Neural Network Based Approach for Robust Cyber Threat Hunting

Renzheng Wei, Lijun Cai, Aimin Yu et al.

Cyber Threat hunting is a proactive search for known attack behaviors in the organizational information system. It is an important component to mitigate advanced persistent threats (APTs). However, the attack behaviors recorded in provenance data may not be completely consistent with the known attack behaviors. In this paper, we propose DeepHunter, a graph neural network (GNN) based graph pattern matching approach that can match provenance data against known attack behaviors in a robust way. Specifically, we design a graph neural network architecture with two novel networks: attribute embedding networks that could incorporate Indicators of Compromise (IOCs) information, and graph embedding networks that could capture the relationships between IOCs. To evaluate DeepHunter, we choose five real and synthetic APT attack scenarios. Results show that DeepHunter can hunt all attack behaviors, and the accuracy and robustness of DeepHunter outperform the state-of-the-art method, Poirot.

CRDec 2, 2020
PiPoMonitor: Mitigating Cross-core Cache Attacks Using the Auto-Cuckoo Filter

Fengkai Yuan, Kai Wang, Rui Hou et al.

Cache side channel attacks obtain victim cache line access footprint to infer security-critical information. Among them, cross-core attacks exploiting the shared last level cache are more threatening as their simplicity to set up and high capacity. Stateful approaches of detection-based mitigation observe precise cache behaviors and protect specific cache lines that are suspected of being attacked. However, their recording structures incur large storage overhead and are vulnerable to reverse engineering attacks. Exploring the intrinsic non-determinate layout of a traditional Cuckoo filter, this paper proposes a space efficient Auto-Cuckoo filter to record access footprints, which succeed to decrease storage overhead and resist reverse engineering attacks at the same time. With Auto-Cuckoo filter, we propose PiPoMonitor to detect \textit{Ping-Pong patterns} and prefetch specific cache line to interfere with adversaries' cache probes. Security analysis shows the PiPoMonitor can effectively mitigate cross-core attacks and the Auto-Cuckoo filter is immune to reverse engineering attacks. Evaluation results indicate PiPoMonitor has negligible impact on performance and the storage overhead is only 0.37$\%$, an order of magnitude lower than previous stateful approaches.

CRMay 17, 2020
A Lightweight Isolation Mechanism for Secure Branch Predictors

Lutan Zhao, Peinan Li, Rui Hou et al.

Recently exposed vulnerabilities reveal the necessity to improve the security of branch predictors. Branch predictors record history about the execution of different programs, and such information from different processes are stored in the same structure and thus accessible to each other. This leaves the attackers with the opportunities for malicious training and malicious perception. Instead of flush-based or physical isolation of hardware resources, we want to achieve isolation of the content in these hardware tables with some lightweight processing using randomization as follows. (1) Content encoding. We propose to use hardware-based thread-private random numbers to encode the contents of the branch predictor tables (both direction and destination histories) which we call XOR-BP. Specifically, the data is encoded by XOR operation with the key before written in the table and decoded after read from the table. Such a mechanism obfuscates the information adding difficulties to cross-process or cross-privilege level analysis and perception. It achieves a similar effect of logical isolation but adds little in terms of space or time overheads. (2) Index encoding. We propose a randomized index mechanism of the branch predictor (Noisy-XOR-BP). Similar to the XOR-BP, another thread-private random number is used together with the branch instruction address as the input to compute the index of the branch predictor. This randomized indexing mechanism disrupts the correspondence between the branch instruction address and the branch predictor entry, thus increases the noise for malicious perception attacks. Our analyses using an FPGA-based RISC-V processor prototype and additional auxiliary simulations suggest that the proposed mechanisms incur a very small performance cost while providing strong protection.

CRFeb 5, 2020
Knowledge Federation: A Unified and Hierarchical Privacy-Preserving AI Framework

Hongyu Li, Dan Meng, Hong Wang et al.

With strict protections and regulations of data privacy and security, conventional machine learning based on centralized datasets is confronted with significant challenges, making artificial intelligence (AI) impractical in many mission-critical and data-sensitive scenarios, such as finance, government, and health. In the meantime, tremendous datasets are scattered in isolated silos in various industries, organizations, different units of an organization, or different branches of an international organization. These valuable data resources are well underused. To advance AI theories and applications, we propose a comprehensive framework (called Knowledge Federation - KF) to address these challenges by enabling AI while preserving data privacy and ownership. Beyond the concepts of federated learning and secure multi-party computation, KF consists of four levels of federation: (1) information level, low-level statistics and computation of data, meeting the requirements of simple queries, searching and simplistic operators; (2) model level, supporting training, learning, and inference; (3) cognition level, enabling abstract feature representation at various levels of abstractions and contexts; (4) knowledge level, fusing knowledge discovery, representation, and reasoning. We further clarify the relationship and differentiation between knowledge federation and other related research areas. We have developed a reference implementation of KF, called iBond Platform, to offer a production-quality KF platform to enable industrial applications in finance, insurance et al. The iBond platform will also help establish the KF community and a comprehensive ecosystem and usher in a novel paradigm shift towards secure, privacy-preserving and responsible AI. As far as we know, knowledge federation is the first hierarchical and unified framework for secure multi-party computing and learning.

CROct 26, 2019
DDM: A Demand-based Dynamic Mitigation for SMT Transient Channels

Yue Zhang, Ziyuan Zhu, Dan Meng

Different from the traditional software vulnerability, the microarchitecture side channel has three characteristics: extensive influence, potent threat, and tough defense. The main reason for the micro-architecture side channel is resource sharing. There are many reasons for resource sharing, one of which is SMT (Simultaneous Multi-Threading) technology. In this paper, we define the SMT Transient Channel, which uses the transient state of shared resources between threads to steal information. To mitigate it, we designed a security demand-based dynamic mitigation (DDM) to Mitigate the SMT transient channels. The DDM writes the processes' security requirements to the CPU register sets, and the operating system calls the HLT instruction to dynamically turn on and off the hyper-threading according to the register values to avoid the side channels caused by execution resource sharing. During the implementation of the scheme, we modified the Linux kernel and used the MSR register groups of Intel processor. The evaluation results show that DDM can effectively protect against the transient side-channel attacks such as PortsMash that rely on SMT, and the performance loss of DDM is less than 8%.

CVOct 12, 2019
Template-Instance Loss for Offline Handwritten Chinese Character Recognition

Yao Xiao, Dan Meng, Cewu Lu et al.

The long-standing challenges for offline handwritten Chinese character recognition (HCCR) are twofold: Chinese characters can be very diverse and complicated while similarly looking, and cursive handwriting (due to increased writing speed and infrequent pen lifting) makes strokes and even characters connected together in a flowing manner. In this paper, we propose the template and instance loss functions for the relevant machine learning tasks in offline handwritten Chinese character recognition. First, the character template is designed to deal with the intrinsic similarities among Chinese characters. Second, the instance loss can reduce category variance according to classification difficulty, giving a large penalty to the outlier instance of handwritten Chinese character. Trained with the new loss functions using our deep network architecture HCCR14Layer model consisting of simple layers, our extensive experiments show that it yields state-of-the-art performance and beyond for offline HCCR.

CRApr 9, 2019
Enabling Privacy-Preserving, Compute- and Data-Intensive Computing using Heterogeneous Trusted Execution Environment

Jianping Zhu, Rui Hou, XiaoFeng Wang et al.

There is an urgent demand for privacy-preserving techniques capable of supporting compute and data intensive (CDI) computing in the era of big data. However, none of existing TEEs can truly support CDI computing tasks, as CDI requires high throughput accelerators like GPU and TPU but TEEs do not offer security protection of such accelerators. This paper present HETEE (Heterogeneous TEE), the first design of TEE capable of strongly protecting heterogeneous computing with unsecure accelerators. HETEE is uniquely constructed to work with today's servers, and does not require any changes for existing commercial CPUs or accelerators. The key idea of our design runs security controller as a stand-alone computing system to dynamically adjust the boundary of between secure and insecure worlds through the PCIe switches, rendering the control of an accelerator to the host OS when it is not needed for secure computing, and shifting it back when it is. The controller is the only trust unit in the system and it runs the custom OS and accelerator runtimes, together with the encryption, authentication and remote attestation components. The host server and other computing systems communicate with controller through an in memory task queue that accommodates the computing tasks offloaded to HETEE, in the form of encrypted and signed code and data. Also, HETEE offers a generic and efficient programming model to the host CPU. We have implemented the HETEE design on a hardware prototype system, and evaluated it with large-scale Neural Networks inference and training tasks. Our evaluations show that HETEE can easily support such secure computing tasks and only incurs a 12.34% throughput overhead for inference and 9.87% overhead for training on average.

CRFeb 3, 2019
Zipper Stack: Shadow Stacks Without Shadow

Jinfeng Li, Liwei Chen, Qizhen Xu et al.

Return-Oriented Programming (ROP) is a typical attack technique that exploits return addresses to abuse existing code repeatedly. Most of the current return address protecting mechanisms (also known as the Backward-Edge Control-Flow Integrity) work only in limited threat models. For example, the attacker cannot break memory isolation, or the attacker has no knowledge of a secret key or random values. This paper presents a novel, lightweight mechanism protecting return addresses, Zipper Stack, which authenticates all return addresses by a chain structure using cryptographic message authentication codes (MACs). This innovative design can defend against the most powerful attackers who have full control over the program's memory and even know the secret key of the MAC function. This threat model is stronger than the one used in related work. At the same time, it produces low-performance overhead. We implemented Zipper Stack by extending the RISC-V instruction set architecture, and the evaluation on FPGA shows that the performance overhead of Zipper Stack is only 1.86%. Thus, we think Zipper Stack is suitable for actual deployment.