Lannan Luo

CR
h-index18
12papers
398citations
Novelty53%
AI Score45

12 Papers

CRMar 22
Zero-Shot Vulnerability Detection in Low-Resource Smart Contracts Through Solidity-Only Training

Minghao Hu, Qiang Zeng, Lannan Luo

Smart contracts have transformed decentralized finance, but flaws in their logic still create major security threats. Most existing vulnerability detection techniques focus on well-supported languages like Solidity, while low-resource counterparts such as Vyper remain largely underexplored due to scarce analysis tools and limited labeled datasets. Training a robust detection model directly on Vyper is particularly challenging, as collecting sufficiently large and diverse Vyper training datasets is difficult in practice. To address this gap, we introduce Sol2Vy, a novel framework that enables cross-language knowledge transfer from Solidity to Vyper, allowing vulnerability detection on Vyper using models trained exclusively on Solidity. This approach eliminates the need for extensive labeled Vyper datasets typically required to build a robust vulnerability detection model. We implement and evaluate Sol2Vy on various critical vulnerability types, including reentrancy, weak randomness, and unchecked transfer. Experimental results show that Sol2Vy, despite being trained exclusively on Solidity, achieves strong detection performance on Vyper contracts and significantly outperforms prior state-of-the-art methods.

CLJun 30, 2025
Impact of Fine-Tuning Methods on Memorization in Large Language Models

Jie Hou, Chuxiong Wu, Lannan Luo et al.

As the capabilities of pre-trained large language models (LLMs) continue to advance, the "pre-train and fine-tune" paradigm has become increasingly mainstream, leading to the development of various fine-tuning methods. However, the privacy risks arising from memorization during fine-tuning have received relatively little attention. To address this gap, we categorize popular fine-tuning approaches and assess their impact on memorization through the lens of membership inference attacks (MIAs). Our results show that, compared to parameter-based fine-tuning, prompt-based fine-tuning achieves competitive performance while exhibiting lower vulnerability to MIAs. Furthermore, prompt-based methods maintain low memorization regardless of model scale. These findings suggest that parameter-based fine-tuning is more prone to leaking private information, whereas prompt-based fine-tuning serves as a more privacy-preserving option.

CVOct 4, 2021
Deep Learning Approach Protecting Privacy in Camera-Based Critical Applications

Gautham Ramajayam, Tao Sun, Chiu C. Tan et al.

Many critical applications rely on cameras to capture video footage for analytical purposes. This has led to concerns about these cameras accidentally capturing more information than is necessary. In this paper, we propose a deep learning approach towards protecting privacy in camera-based systems. Instead of specifying specific objects (e.g. faces) are privacy sensitive, our technique distinguishes between salient (visually prominent) and non-salient objects based on the intuition that the latter is unlikely to be needed by the application.

CRJan 26, 2021
PFirewall: Semantics-Aware Customizable Data Flow Control for Smart Home Privacy Protection

Haotian Chi, Qiang Zeng, Xiaojiang Du et al.

Internet of Things (IoT) platforms enable users to deploy home automation applications. Meanwhile, privacy issues arise as large amounts of sensitive device data flow out to IoT platforms. Most of the data flowing out to a platform actually do not trigger automation actions, while homeowners currently have no control once devices are bound to the platform. We present PFirewall, a customizable data-flow control system to enhance the privacy of IoT platform users. PFirewall automatically generates data-minimization policies, which only disclose minimum amount of data to fulfill automation. In addition, PFirewall provides interfaces for homeowners to customize individual privacy preferences by defining user-specified policies. To enforce these policies, PFirewall transparently intervenes and mediates the communication between IoT devices and the platform, without modifying the platform, IoT devices, or hub. Evaluation results on four real-world testbeds show that PFirewall reduces IoT data sent to the platform by 97% without impairing home automation, and effectively mitigates user-activity inference/tracking attacks and other privacy risks.

CROct 17, 2019
PFirewall: Semantics-Aware Customizable Data Flow Control for Home Automation Systems

Haotian Chi, Qiang Zeng, Xiaojiang Du et al.

Emerging Internet of Thing (IoT) platforms provide a convenient solution for integrating heterogeneous IoT devices and deploying home automation applications. However, serious privacy threats arise as device data now flow out to the IoT platforms, which may be subject to various attacks. We observe two privacy-unfriendly practices in emerging home automation systems: first, the majority of data flowed to the platform are superfluous in the sense that they do not trigger any home automation; second, home owners currently have nearly zero control over their data. We present PFirewall, a customizable data-flow control system to enhance user privacy. PFirewall analyzes the automation apps to extract their semantics, which are automatically transformed into data-minimization policies; these policies only send minimized data flows to the platform for app execution, such that the ability of attackers to infer user privacy is significantly impaired. In addition, PFirewall provides capabilities and interfaces for users to define and enforce customizable policies based on individual privacy preferences. PFirewall adopts an elegant man-in-the-middle design, transparently executing data minimization and user-defined policies to process raw data flows and mediating the processed data between IoT devices and the platform (via the hub), without requiring modifications of the platform or IoT devices. We implement PFirewall to work with two popular platforms: SmartThings and openHAB, and set up two real-world testbeds to evaluate its performance. The evaluation results show that PFirewall is very effective: it reduces IoT data sent to the platform by 97% and enforces user defined policies successfully.

SDDec 26, 2018
A Multiversion Programming Inspired Approach to Detecting Audio Adversarial Examples

Qiang Zeng, Jianhai Su, Chenglong Fu et al.

Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which makes countermeasures against them an urgent task. Our experiments show that, given an AE, the transcription results by different Automatic Speech Recognition (ASR) systems differ significantly, as they use different architectures, parameters, and training datasets. Inspired by Multiversion Programming, we propose a novel audio AE detection approach, which utilizes multiple off-the-shelf ASR systems to determine whether an audio input is an AE. The evaluation shows that the detection achieves accuracies over 98.6%.

CRDec 23, 2018
A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Kimberly Redmond, Lannan Luo, Qiang Zeng

Given a closed-source program, such as most of proprietary software and viruses, binary code analysis is indispensable for many tasks, such as code plagiarism detection and malware analysis. Today, source code is very often compiled for various architectures, making cross-architecture binary code analysis increasingly important. A binary, after being disassembled, is expressed in an assembly languages. Thus, recent work starts exploring Natural Language Processing (NLP) inspired binary code analysis. In NLP, words are usually represented in high-dimensional vectors (i.e., embeddings) to facilitate further processing, which is one of the most common and critical steps in many NLP tasks. We regard instructions as words in NLP-inspired binary code analysis, and aim to represent instructions as embeddings as well. To facilitate cross-architecture binary code analysis, our goal is that similar instructions, regardless of their architectures, have embeddings close to each other. To this end, we propose a joint learning approach to generating instruction embeddings that capture not only the semantics of instructions within an architecture, but also their semantic relationships across architectures. To the best of our knowledge, this is the first work on building cross-architecture instruction embedding model. As a showcase, we apply the model to resolving one of the most fundamental problems for binary code similarity comparison---semantics-based basic block comparison, and the solution outperforms the code statistics based approach. It demonstrates that it is promising to apply the model to other cross-architecture binary code analysis tasks.

CRDec 23, 2018
Exploiting the Inherent Limitation of L0 Adversarial Examples

Fei Zuo, Bokai Yang, Xiaopeng Li et al.

Despite the great achievements made by neural networks on tasks such as image classification, they are brittle and vulnerable to adversarial example (AE) attacks, which are crafted by adding human-imperceptible perturbations to inputs in order that a neural-network-based classifier incorrectly labels them. In particular, L0 AEs are a category of widely discussed threats where adversaries are restricted in the number of pixels that they can corrupt. However, our observation is that, while L0 attacks modify as few pixels as possible, they tend to cause large-amplitude perturbations to the modified pixels. We consider this as an inherent limitation of L0 AEs, and thwart such attacks by both detecting and rectifying them. The main novelty of the proposed detector is that we convert the AE detection problem into a comparison problem by exploiting the inherent limitation of L0 attacks. More concretely, given an image I, it is pre-processed to obtain another image I' . A Siamese network, which is known to be effective in comparison, takes I and I' as the input pair to determine whether I is an AE. A trained Siamese network automatically and precisely captures the discrepancies between I and I' to detect L0 perturbations. In addition, we show that the pre-processing technique, inpainting, used for detection can also work as an effective defense, which has a high probability of removing the adversarial influence of L0 perturbations. Thus, our system, called AEPECKER, demonstrates not only high AE detection accuracies, but also a notable capability to correct the classification results.

CRDec 11, 2018
Code-less Patching for Heap Vulnerabilities Using Targeted Calling Context Encoding

Qiang Zeng, Golam Kayas, Emil Mohammed et al.

Exploitation of heap vulnerabilities has been on the rise, leading to many devastating attacks. Conventional heap patch generation is a lengthy procedure, requiring intensive manual efforts. Worse, fresh patches tend to harm system dependability, hence deterring users from deploying them. We propose a heap patching system that simultaneously has the following prominent advantages: (1) generating patches without manual efforts; (2) installing patches without altering the code (so called code-less patching); (3) handling various heap vulnerability types; (4) imposing a very low overhead; and (5) no dependency on specific heap allocators. As a separate contribution, we propose targeted calling context encoding, which is a suite of algorithms for optimizing calling context encoding, an important technique with applications in many areas. The system properly combines heavyweight offline attack analysis with lightweight online defense generation, and provides a new countermeasure against heap attacks. The evaluation shows that the system is effective and efficient.

CRNov 21, 2018
Validating the Contextual Information of Outdoor Images for Photo Misuse Detection

Xiaopeng Li, Xianshan Qu, Wenyuan Xu et al.

The contextual information (i.e., the time and location) in which a photo is taken can be easily tampered with or falsely claimed by forgers to achieve malicious purposes, e.g., creating fear among the general public. A rich body of work has focused on detecting photo tampering and manipulation by verifying the integrity of image content. Instead, we aim to detect photo misuse by verifying the capture time and location of photos. This paper is motivated by the law of nature that sun position varies with the time and location, which can be used to determine whether the claimed contextual information corresponds with the sun position that the image content actually indicates. Prior approaches to inferring sun position from images mainly rely on vanishing points associated with at least two shadows, while we propose novel algorithms which utilize only one shadow in the image to infer the sun position. Meanwhile, we compute the sun position by applying astronomical algorithms which take as input the claimed capture time and location. Only when the two estimated sun positions are consistent can the claimed contextual information be genuine. We have developed a prototype called IMAGEGUARD. The experimental results show that our method can successfully estimate sun position and detect the time-location inconsistency with high accuracy. By setting the thresholds to be 9.4 degrees and 5 degrees for the sun position distance and the altitude angle distance, respectively, our system can correctly identify 91.5% of falsified photos with fake contextual information.

SEAug 8, 2018
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

Fei Zuo, Xiaopeng Li, Patrick Young et al.

Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.

CRNov 2, 2016
Context-aware System Service Call-oriented Symbolic Execution of Android Framework with Application to Exploit Generation

Lannan Luo, Qiang Zeng, Chen Cao et al.

Android Framework is a layer of software that exists in every Android system managing resources of all Android apps. A vulnerability in Android Framework can lead to severe hacks, such as destroying user data and leaking private information. With tens of millions of Android devices unpatched due to Android fragmentation, vulnerabilities in Android Framework certainly attract attackers to exploit them. So far, enormous manual effort is needed to craft such exploits. To our knowledge, no research has been done on automatic generation of exploits that take advantage of Android Framework vulnerabilities. We make a first step towards this goal by applying symbolic execution of Android Framework to finding bugs and generating exploits. Several challenges have been raised by the task. (1) The information of an app flows to Android Framework in multiple intricate steps, making it difficult to identify symbolic inputs. (2) Android Framework has a complex initialization phase, which exacerbates the state space explosion problem. (3) A straightforward design that builds the symbolic executor as a layer inside the Android system will not work well: not only does the implementation have to ensure the compatibility with the Android system, but it needs to be maintained whenever Android gets updated. We present novel ideas and techniques to resolve the challenges, and have built the first system for symbolic execution of Android Framework. It fundamentally changes the state of the art in exploit generation on the Android system, and has been applied to constructing new techniques for finding vulnerabilities.