CLMar 2, 2025Code
Optimizing Multi-Hop Document Retrieval Through Intermediate RepresentationsJiaen Lin, Jingyu Liu, Yingbo Liu
Retrieval-augmented generation (RAG) encounters challenges when addressing complex queries, particularly multi-hop questions. While several methods tackle multi-hop queries by iteratively generating internal queries and retrieving external documents, these approaches are computationally expensive. In this paper, we identify a three-stage information processing pattern in LLMs during layer-by-layer reasoning, consisting of extraction, processing, and subsequent extraction steps. This observation suggests that the representations in intermediate layers contain richer information compared to those in other layers. Building on this insight, we propose Layer-wise RAG (L-RAG). Unlike prior methods that focus on generating new internal queries, L-RAG leverages intermediate representations from the middle layers, which capture next-hop information, to retrieve external knowledge. L-RAG achieves performance comparable to multi-step approaches while maintaining inference overhead similar to that of standard RAG. Experimental results show that L-RAG outperforms existing RAG methods on open-domain multi-hop question-answering datasets, including MuSiQue, HotpotQA, and 2WikiMultiHopQA. The code is available in https://anonymous.4open.science/r/L-RAG-ADD5/
CRJan 5, 2020
Covert Association of Applications on Edge Devices by Processor WorkloadHangtai Li, Yingbo Liu, Rui Tan
The scheme of application (app) distribution systems involving incentivized third-party app vendors is a desirable option for the emerging edge computing systems. However, such a scheme also brings various security challenges as faced by the current mobile app distribution systems. In this paper, we study a threat named covert device association, in which the vendors of two apps collude to figure out which of their app installations run on the same edge device. If the two colluding apps are popular, the threat can be used to launch various types of further attacks at scale. For example, the user of the compromised edge device, who wishes to remain anonymous to one of the two apps, will be de-anonymized if the user is not anonymous to the other app. Moreover, the coalition of the two apps will have an escalated privilege set that is the union of their individual privilege sets. In this paper, we implement the threat by a reliable and ubiquitous covert channel based on the edge device processor workload. The implementations on three edge devices (two smartphones and an embedded compute board) running Android and Android Things do not require any privileged permissions. Our implementations cover three attack scenarios of 1) two apps running on the same Android phone, 2) an app and a web session in the Tor browser running on the same Android phone, and 3) two apps running on the same Android Things device. Experiments show that the covert channel gives at least 0.25 bps data rate and the covert device association takes at most 3.2 minutes.
CRJun 26, 2019
Privacy-Preserving Blockchain-Based Federated Learning for IoT DevicesYang Zhao, Jun Zhao, Linshan Jiang et al.
Home appliance manufacturers strive to obtain feedback from users to improve their products and services to build a smart home system. To help manufacturers develop a smart home system, we design a federated learning (FL) system leveraging the reputation mechanism to assist home appliance manufacturers to train a machine learning model based on customers' data. Then, manufacturers can predict customers' requirements and consumption behaviors in the future. The working flow of the system includes two stages: in the first stage, customers train the initial model provided by the manufacturer using both the mobile phone and the mobile edge computing (MEC) server. Customers collect data from various home appliances using phones, and then they download and train the initial model with their local data. After deriving local models, customers sign on their models and send them to the blockchain. In case customers or manufacturers are malicious, we use the blockchain to replace the centralized aggregator in the traditional FL system. Since records on the blockchain are untampered, malicious customers or manufacturers' activities are traceable. In the second stage, manufacturers select customers or organizations as miners for calculating the averaged model using received models from customers. By the end of the crowdsourcing task, one of the miners, who is selected as the temporary leader, uploads the model to the blockchain. To protect customers' privacy and improve the test accuracy, we enforce differential privacy on the extracted features and propose a new normalization technique. We experimentally demonstrate that our normalization technique outperforms batch normalization when features are under differential privacy protection. In addition, to attract more customers to participate in the crowdsourcing FL task, we design an incentive mechanism to award participants.