Jianbo Gao

CR
h-index29
12papers
132citations
Novelty55%
AI Score52

12 Papers

CLAug 8, 2022
Learning Entity Linking Features for Emerging Entities

Chenwei Ran, Wei Shen, Jianbo Gao et al.

Entity linking (EL) is the process of linking entity mentions appearing in text with their corresponding entities in a knowledge base. EL features of entities (e.g., prior probability, relatedness score, and entity embedding) are usually estimated based on Wikipedia. However, for newly emerging entities (EEs) which have just been discovered in news, they may still not be included in Wikipedia yet. As a consequence, it is unable to obtain required EL features for those EEs from Wikipedia and EL models will always fail to link ambiguous mentions with those EEs correctly as the absence of their EL features. To deal with this problem, in this paper we focus on a new task of learning EL features for emerging entities in a general way. We propose a novel approach called STAMO to learn high-quality EL features for EEs automatically, which needs just a small number of labeled documents for each EE collected from the Web, as it could further leverage the knowledge hidden in the unlabeled data. STAMO is mainly based on self-training, which makes it flexibly integrated with any EL feature or EL model, but also makes it easily suffer from the error reinforcement problem caused by the mislabeled data. Instead of some common self-training strategies that try to throw the mislabeled data away explicitly, we regard self-training as a multiple optimization process with respect to the EL features of EEs, and propose both intra-slot and inter-slot optimizations to alleviate the error reinforcement problem implicitly. We construct two EL datasets involving selected EEs to evaluate the quality of obtained EL features for EEs, and the experimental results show that our approach significantly outperforms other baseline methods of learning EL features.

CRFeb 24
AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Che Wang, Jiaming Zhang, Ziqi Zhang et al. · gatech

The integration of external data services (e.g., Model Context Protocol, MCP) has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection (IPI) attacks. Existing attack methods are limited by their reliance on static patterns and evaluation on simple language models, failing to address the fast-evolving nature of modern AI agents. We introduce AdapTools, a novel adaptive IPI attack framework that selects stealthier attack tools and generates adaptive attack prompts to create a rigorous security evaluation environment. Our approach comprises two key components: (1) Adaptive Attack Strategy Construction, which develops transferable adversarial strategies for prompt optimization, and (2) Attack Enhancement, which identifies stealthy tools capable of circumventing task-relevance defenses. Comprehensive experimental evaluation shows that AdapTools achieves a 2.13 times improvement in attack success rate while degrading system utility by a factor of 1.78. Notably, the framework maintains its effectiveness even against state-of-the-art defense mechanisms. Our method advances the understanding of IPI attacks and provides a useful reference for future research.

CLMay 24, 2022
Community Question Answering Entity Linking via Leveraging Auxiliary Data

Yuhan Li, Wei Shen, Jianbo Gao et al.

Community Question Answering (CQA) platforms contain plenty of CQA texts (i.e., questions and answers corresponding to the question) where named entities appear ubiquitously. In this paper, we define a new task of CQA entity linking (CQAEL) as linking the textual entity mentions detected from CQA texts with their corresponding entities in a knowledge base. This task can facilitate many downstream applications including expert finding and knowledge base enrichment. Traditional entity linking methods mainly focus on linking entities in news documents, and are suboptimal over this new task of CQAEL since they cannot effectively leverage various informative auxiliary data involved in the CQA platform to aid entity linking, such as parallel answers and two types of meta-data (i.e., topic tags and users). To remedy this crucial issue, we propose a novel transformer-based framework to effectively harness the knowledge delivered by different kinds of auxiliary data to promote the linking performance. We validate the superiority of our framework through extensive experiments over a newly released CQAEL data set against state-of-the-art entity linking methods.

DCMar 19
A402: Binding Cryptocurrency Payments to Service Execution for Agentic Commerce

Yue Li, Lei Wang, Kaixuan Wang et al.

The rapid proliferation of autonomous AI agents is driving a shift toward agentic commerce, where agents are expected to autonomously invoke and pay for services. While blockchain-based payments offer a programmable foundation for such interactions, the recently proposed x402 standard fails to enforce end-to-end atomicity across service execution, payment, and result delivery. In this paper, we present A402, a trust-minimized payment architecture that securely binds cryptocurrency payments to service execution. A402 introduces Atomic Service Channels (ASCs), a new channel protocol that integrates service execution into payment channels, enabling real-time, high-frequency micropayments for agentic commerce. Within each ASC, A402 employs an atomic exchange protocol based on TEE-assisted adaptor signatures, ensuring that payments are finalized if and only if the requested service is correctly executed and the corresponding result is delivered. To further ensure privacy, A402 incorporates a TEE-based Liquidity Vault that privately manages the lifecycle of ASCs and aggregates their settlements into a single on-chain transaction, revealing only aggregated balances. We implement A402 and evaluate it against x402 with integrations on both Bitcoin and Ethereum. Our results show that A402 delivers orders-of-magnitude performance and on-chain cost improvements over x402 while providing trust-minimized security guarantees.

AIFeb 24
ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Che Wang, Fuyao Zhang, Jiaming Zhang et al.

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict filtering or refusal mechanisms, which suffer from a critical limitation: over-refusal, prematurely terminating valid agentic workflows. We propose ICON, a probing-to-mitigation framework that neutralizes attacks while preserving task continuity. Our key insight is that IPI attacks leave distinct over-focusing signatures in the latent space. We introduce a Latent Space Trace Prober to detect attacks based on high intensity scores. Subsequently, a Mitigating Rectifier performs surgical attention steering that selectively manipulate adversarial query key dependencies while amplifying task relevant elements to restore the LLM's functional trajectory. Extensive evaluations on multiple backbones show that ICON achieves a competitive 0.4% ASR, matching commercial grade detectors, while yielding a over 50% task utility gain. Furthermore, ICON demonstrates robust Out of Distribution(OOD) generalization and extends effectively to multi-modal agents, establishing a superior balance between security and efficiency.

CRMay 2
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

Xulin Hu, Che Wang, Wei Yang Bryan Lim et al.

Representation Engineering typically relies on static refusal vectors derived from terminal representations. We move beyond this paradigm, demonstrating that refusal is a dynamic and sparse process rather than a localized outcome. Using Causal Tracing, we uncover the Refusal Trajectory-a persistent upstream signature that remains intact even when adversarial attacks (e.g., GCG) suppress terminal signals. Leveraging this, we propose SALO (Sparse Activation Localization Operator), an inference-time detector designed to capture these latent patterns. SALO effectively recovers defense capabilities against forced-decoding attacks, improving detection rates from ~0% to >90% where methods relying on terminal states perform poorly.

CRApr 28, 2025
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection

Jianbo Gao, Keke Gai, Jing Yu et al.

Recent advancement in large-scale Artificial Intelligence (AI) models offering multimodal services have become foundational in AI systems, making them prime targets for model theft. Existing methods select Out-of-Distribution (OoD) data as backdoor watermarks and retrain the original model for copyright protection. However, existing methods are susceptible to malicious detection and forgery by adversaries, resulting in watermark evasion. In this work, we propose Model-\underline{ag}nostic Black-box Backdoor W\underline{ate}rmarking Framework (AGATE) to address stealthiness and robustness challenges in multimodal model copyright protection. Specifically, we propose an adversarial trigger generation method to generate stealthy adversarial triggers from ordinary dataset, providing visual fidelity while inducing semantic shifts. To alleviate the issue of anomaly detection among model outputs, we propose a post-transform module to correct the model output by narrowing the distance between adversarial trigger image embedding and text embedding. Subsequently, a two-phase watermark verification is proposed to judge whether the current model infringes by comparing the two results with and without the transform module. Consequently, we consistently outperform state-of-the-art methods across five datasets in the downstream tasks of multimodal image-text retrieval and image classification. Additionally, we validated the robustness of AGATE under two adversarial attack scenarios.

CRMar 23, 2021
TrustCross: Enabling Confidential Interoperability across Blockchains Using Trusted Hardware

Ying Lan, Jianbo Gao, Ke Wang et al.

With the rapid development of blockchain technology, different types of blockchains are adopted and interoperability across blockchains has received widespread attention. There have been many cross-chain solutions proposed in recent years, including notary scheme, sidechain, and relay chain. However, most of the existing platforms do not take confidentiality into account, although privacy has become an important concern for blockchain. In this paper, we present TrustCross, a privacy-preserving cross-chain platform to enable confidential interoperability across blockchains. The key insight behind TrustCross is to encrypt cross-chain communication data on the relay chain with the assistance of trusted execution environment and employ fine-grained access control to protect user privacy. Our experimental results show that TrustCross achieves reasonable latency and high scalability on the contract calls across heterogeneous blockchains.

SEJun 2, 2020
Kaya: A Testing Framework for Blockchain-based Decentralized Applications

Zhenhao Wu, Jiashuo Zhang, Jianbo Gao et al.

In recent years, many decentralized applications based on blockchain (DApp) have been developed. However, due to inadequate testing, DApps are easily exposed to serious vulnerabilities. We find three main challenges for DApp testing, i.e., the inherent complexity of DApp, inconvenient pre-state setting, and not-so-readable logs. In this paper, we propose a testing framework named Kaya to bridge these gaps. Kaya has three main functions. Firstly, Kaya proposes DApp behavior description language (DBDL) to make writing test cases easier. Test cases written in DBDL can also be automatically executed by Kaya. Secondly, Kaya supports a flexible and convenient way for test engineers to set the blockchain pre-states easily. Thirdly, Kaya transforms incomprehensible addresses into readable variables for easy comprehension. With these functions, Kaya can help test engineers test DApps more easily. Besides, to fit the various application environments, we provide two ways for test engineers to use Kaya, i.e., UI and command-line. Our experimental case demonstrates the potential of Kaya in helping test engineers to test DApps more easily.

CRSep 6, 2019
SEdroid: A Robust Android Malware Detector using Selective Ensemble Learning

Ji Wang, Qi Jing, Jianbo Gao

For the dramatic increase of Android malware and low efficiency of manual check process, deep learning methods started to be an auxiliary means for Android malware detection these years. However, these models are highly dependent on the quality of datasets, and perform unsatisfactory results when the quality of training data is not good enough. In the real world, the quality of datasets without manually check cannot be guaranteed, even Google Play may contain malicious applications, which will cause the trained model failure. To address the challenge, we propose a robust Android malware detection approach based on selective ensemble learning, trying to provide an effective solution not that limited to the quality of datasets. The proposed model utilizes genetic algorithm to help find the best combination of the component learners and improve robustness of the model. Our results show that the proposed approach achieves a more robust performance than other approaches in the same area.

IRMar 27, 2019
Tracking the Consumption Junction: Temporal Dependencies between Articles and Advertisements in Dutch Newspapers

Melvin Wevers, Jianbo Gao, Kristoffer L. Nielbo

Historians have regularly debated whether advertisements can be used as a viable source to study the past. Their main concern centered on the question of agency. Were advertisements a reflection of historical events and societal debates, or were ad makers instrumental in shaping society and the ways people interacted with consumer goods? Using techniques from econometrics (Granger causality test) and complexity science (Adaptive Fractal Analysis), this paper analyzes to what extent advertisements shaped or reflected society. We found evidence that indicate a fundamental difference between the dynamic behavior of word use in articles and advertisements published in a century of Dutch newspapers. Articles exhibit persistent trends that are likely to be reflective of communicative memory. Contrary to this, advertisements have a more irregular behavior characterized by short bursts and fast decay, which, in part, mirrors the dynamic through which advertisers introduced terms into public discourse. On the issue of whether advertisements shaped or reflected society, we found particular product types that seemed to be collectively driven by a causality going from advertisements to articles. Generally, we found support for a complex interaction pattern dubbed the consumption junction. Finally, we discovered noteworthy patterns in terms of causality and long-range dependencies for specific product groups. All in, this study shows how methods from econometrics and complexity science can be applied to humanities data to improve our understanding of complex cultural-historical phenomena such as the role of advertising in society.

SENov 9, 2018
EASYFLOW: Keep Ethereum Away From Overflow

Jianbo Gao, Han Liu, Chao Liu et al.

While Ethereum smart contracts enabled a wide range of blockchain applications, they are extremely vulnerable to different forms of security attacks. Due to the fact that transactions to smart contracts commonly involve cryptocurrency transfer, any successful attacks can lead to money loss or even financial disorder. In this paper, we focus on the overflow attacks in Ethereum , mainly because they widely rooted in many smart contracts and comparatively easy to exploit. We have developed EASYFLOW , an overflow detector at Ethereum Virtual Machine level. The key insight behind EASYFLOW is a taint analysis based tracking technique to analyze the propagation of involved taints. Specifically, EASYFLOW can not only divide smart contracts into safe contracts, manifested overflows, well-protected overflows and potential overflows, but also automatically generate transactions to trigger potential overflows. In our preliminary evaluation, EASYFLOW managed to find potentially vulnerable Ethereum contracts with little runtime overhead.