90.5CYMay 25
A Technical Policy Blueprint for Trustworthy Decentralized AIHasan Kassem, Orion Banks, Omar Benjelloun et al.
Decentralized AI systems, such as federated learning, can play a critical role in further unlocking AI asset marketplaces (e.g., healthcare data marketplaces) thanks to increased asset privacy protection. Unlocking this big potential necessitates governance mechanisms that are transparent, scalable, and verifiable. However current governance approaches rely on bespoke, infrastructure-specific policies that hinder asset interoperability and trust among systems. We are proposing a Technical Policy Blueprint that encodes governance requirements as policy-as-code objects and separates asset policy verification from asset policy enforcement. In this architecture the Policy Engine verifies evidence (e.g., identities, signatures, payments, trusted-hardware attestations) and issues capability packages. Asset Guardians (e.g. data guardians, model guardians, computation guardians, etc.) enforce access or execution solely based on these capability packages. This core concept of decoupling policy processing from capabilities enables governance to evolve without reconfiguring AI infrastructure, thus creating an approach that is transparent, auditable, and resilient to change.
DCMay 5, 2022
A Collaboration Strategy in the Mining Pool for Proof-of-Neural-Architecture ConsensusBoyang Li, Qing Lu, Weiwen Jiang et al.
In most popular public accessible cryptocurrency systems, the mining pool plays a key role because mining cryptocurrency with the mining pool turns the non-profitable situation into profitable for individual miners. In many recent novel blockchain consensuses, the deep learning training procedure becomes the task for miners to prove their workload, thus the computation power of miners will not purely be spent on the hash puzzle. In this way, the hardware and energy will support the blockchain service and deep learning training simultaneously. While the incentive of miners is to earn tokens, individual miners are motivated to join mining pools to become more competitive. In this paper, we are the first to demonstrate a mining pool solution for novel consensuses based on deep learning. The mining pool manager partitions the full searching space into subspaces and all miners are scheduled to collaborate on the Neural Architecture Search (NAS) tasks in the assigned subspace. Experiments demonstrate that the performance of this type of mining pool is more competitive than an individual miner. Due to the uncertainty of miners' behaviors, the mining pool manager checks the standard deviation of the performance of high reward miners and prepares backup miners to ensure the completion of the tasks of high reward miners.
LGJul 30, 2023
Proof-of-Federated-Learning-Subchain: Free Partner Selection Subchain Based on Federated LearningBoyang Li, Bingyu Shen, Qing Lu et al.
The continuous thriving of the Blockchain society motivates research in novel designs of schemes supporting cryptocurrencies. Previously multiple Proof-of-Deep-Learning(PoDL) consensuses have been proposed to replace hashing with useful work such as deep learning model training tasks. The energy will be more efficiently used while maintaining the ledger. However deep learning models are problem-specific and can be extremely complex. Current PoDL consensuses still require much work to realize in the real world. In this paper, we proposed a novel consensus named Proof-of-Federated-Learning-Subchain(PoFLSC) to fill the gap. We applied a subchain to record the training, challenging, and auditing activities and emphasized the importance of valuable datasets in partner selection. We simulated 20 miners in the subchain to demonstrate the effectiveness of PoFLSC. When we reduce the pool size concerning the reservation priority order, the drop rate difference in the performance in different scenarios further exhibits that the miner with a higher Shapley Value (SV) will gain a better opportunity to be selected when the size of the subchain pool is limited. In the conducted experiments, the PoFLSC consensus supported the subchain manager to be aware of reservation priority and the core partition of contributors to establish and maintain a competitive subchain.
CRNov 10, 2025Code
A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on BlockchainYining Lu, Wenyi Tang, Max Johnson et al.
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
QUANT-PHApr 20, 2024
PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the CloudZhepeng Wang, Yi Sheng, Nirajan Koirala et al.
Benefiting from cloud computing, today's early-stage quantum computers can be remotely accessed via the cloud services, known as Quantum-as-a-Service (QaaS). However, it poses a high risk of data leakage in quantum machine learning (QML). To run a QML model with QaaS, users need to locally compile their quantum circuits including the subcircuit of data encoding first and then send the compiled circuit to the QaaS provider for execution. If the QaaS provider is untrustworthy, the subcircuit to encode the raw data can be easily stolen. Therefore, we propose a co-design framework for preserving the data security of QML with the QaaS paradigm, namely PristiQ. By introducing an encryption subcircuit with extra secure qubits associated with a user-defined security key, the security of data can be greatly enhanced. And an automatic search algorithm is proposed to optimize the model to maintain its performance on the encrypted quantum data. Experimental results on simulation and the actual IBM quantum computer both prove the ability of PristiQ to provide high security for the quantum data while maintaining the model performance in QML.
CRMar 26, 2021
Genomic Encryption of Biometric Information for Privacy-Preserving ForensicsTaeho Jung, Ryan Karl, Geoffrey H. Siwo
DNA fingerprinting is a cornerstone for human identification in forensics, where the sequence of highly polymorphic short tandem repeats (STRs) from an individual is compared against a DNA database. This presents significant privacy risks to individuals with DNA profiles in the database due to hacking by malicious attackers who may access the data and misuse it for secondary purposes. In this paper, we propose a novel cryptographic framework for jointly encrypting DNA-based fingerprints (STRs) with other biometric data, for example, facial images, such that the STRs and biometrics information of an individual are revealed only when a positive match is found, i.e. the STRs act as decryption keys. Specifically, when a search is performed on the encrypted database using STR sequences of an individual in the database, a perfect match generates the facial image and/ or other biometrics of the individual while the lack of a match returns a null result. By jointly encrypting DNA fingerprints and other biometrics using the unique STRs generated keys, our approach ensures perfect privacy of the encrypted information with decryption of only the record with STRs matching the query. This safeguards the information of other individuals in the same database. The proposed approach can also be used to securely authenticate the identity of individuals or biological material in scenarios beyond forensics including tracking the identity of samples for clinical genetics and cell therapies.
CRSep 15, 2020
Federated Dynamic GNN with Secure AggregationMeng Jiang, Taeho Jung, Ryan Karl et al.
Given video data from multiple personal devices or street cameras, can we exploit the structural and dynamic information to learn dynamic representation of objects for applications such as distributed surveillance, without storing data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from multi-user graph sequences: i) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. ii) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local models on the respective user devices learn and periodically send their learning to the central server without ever exposing the user's data to server. iii) Studies showed that the aggregated parameters could be inspected though decrypted when broadcast to clients for model synchronizing, after the server performed a weighted average. We design an appropriate aggregation mechanism of secure aggregation primitives that can protect the security and privacy in federated learning with scalability. Experiments on four video camera datasets (in four different scenes) as well as simulation demonstrate that Feddy achieves great effectiveness and security.
CRMay 5, 2020
Computing-in-Memory for Performance and Energy Efficient Homomorphic EncryptionDayane Reis, Jonathan Takeshita, Taeho Jung et al.
Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE computations degrades computational efficiency. Near-memory Processing (NMP) and Computing-in-memory (CiM) - paradigms where computation is done within the memory boundaries - represent architectural solutions for reducing latency and energy associated with data transfers in data-intensive applications such as HE. This paper introduces CiM-HE, a Computing-in-memory (CiM) architecture that can support operations for the B/FV scheme, a somewhat homomorphic encryption scheme for general computation. CiM-HE hardware consists of customized peripherals such as sense amplifiers, adders, bit-shifters, and sequencing circuits. The peripherals are based on CMOS technology, and could support computations with memory cells of different technologies. Circuit-level simulations are used to evaluate our CiM-HE framework assuming a 6T-SRAM memory. We compare our CiM-HE implementation against (i) two optimized CPU HE implementations, and (ii) an FPGA-based HE accelerator implementation. When compared to a CPU solution, CiM-HE obtains speedups between 4.6x and 9.1x, and energy savings between 266.4x and 532.8x for homomorphic multiplications (the most expensive HE operation). Also, a set of four end-to-end tasks, i.e., mean, variance, linear regression, and inference are up to 1.1x, 7.7x, 7.1x, and 7.5x faster (and 301.1x, 404.6x, 532.3x, and 532.8x more energy efficient). Compared to CPU-based HE in a previous work, CiM-HE obtain 14.3x speed-up and >2600x energy savings. Finally, our design offers 2.2x speed-up with 88.1x energy savings compared to a state-of-the-art FPGA-based accelerator.
CRMay 5, 2020
Secure Single-Server Nearly-Identical Image DeduplicationJonathan Takeshita, Ryan Karl, Taeho Jung
Cloud computing is often utilized for file storage. Clients of cloud storage services want to ensure the privacy of their data, and both clients and servers want to use as little storage as possible. Cross-user deduplication is one method to reduce the amount of storage a server uses. Deduplication and privacy are naturally conflicting goals, especially for nearly-identical (``fuzzy'') deduplication, as some information about the data must be used to perform deduplication. Prior solutions thus utilize multiple servers, or only function for exact deduplication. In this paper, we present a single-server protocol for cross-user nearly-identical deduplication based on secure locality-sensitive hashing (SLSH). We formally define our ideal security, and rigorously prove our protocol secure against fully malicious, colluding adversaries with a proof by simulation. We show experimentally that the individual parts of the protocol are computationally feasible, and further discuss practical issues of security and efficiency.
CRFeb 24, 2020
Ensuring Privacy in Location-Based Services: A Model-based ApproachAlireza Partovi, Wei Zheng, Taeho Jung et al.
In recent years, the widespread of mobile devices equipped with GPS and communication chips has led to the growing use of location-based services (LBS) in which a user receives a service based on his current location. The disclosure of user's location, however, can raise serious concerns about user privacy in general, and location privacy in particular which led to the development of various location privacy-preserving mechanisms aiming to enhance the location privacy while using LBS applications. In this paper, we propose to model the user mobility pattern and utility of the LBS as a Markov decision process (MDP), and inspired by probabilistic current state opacity notation, we introduce a new location privacy metric, namely $ε-$privacy, that quantifies the adversary belief over the user's current location. We exploit this dynamic model to design a LPPM that while it ensures the utility of service is being fully utilized, independent of the adversary prior knowledge about the user, it can guarantee a user-specified privacy level can be achieved for an infinite time horizon. The overall privacy-preserving framework, including the construction of the user mobility model as a MDP, and design of the proposed LPPM, are demonstrated and validated with real-world experimental data.
DCApr 15, 2019
DLBC: A Deep Learning-Based Consensus in Blockchains for Deep Learning ServicesBoyang Li, Changhao Chenli, Xiaowei Xu et al.
With the increasing artificial intelligence application, deep neural network (DNN) has become an emerging task. However, to train a good deep learning model will suffer from enormous computation cost and energy consumption. Recently, blockchain has been widely used, and during its operation, a huge amount of computation resources are wasted for the Proof of Work (PoW) consensus. In this paper, we propose DLBC to exploit the computation power of miners for deep learning training as proof of useful work instead of calculating hash values. it distinguishes itself from recent proof of useful work mechanisms by addressing various limitations of them. Specifically, DLBC handles multiple tasks, larger model and training datasets, and introduces a comprehensive ranking mechanism that considers tasks difficulty(e.g., model complexity, network burden, data size, queue length). We also applied DNN-watermark [1] to improve the robustness. In Section V, the average overhead of digital signature is 1.25, 0.001, 0.002 and 0.98 seconds, respectively, and the average overhead of network is 3.77, 3.01, 0.37 and 0.41 seconds, respectively. Embedding a watermark takes 3 epochs and removing a watermark takes 30 epochs. This penalty of removing watermark will prevent attackers from stealing, improving, and resubmitting DL models from honest miners.
CRFeb 11, 2019
Energy-recycling Blockchain with Proof-of-Deep-LearningChanghao Chenli, Boyang Li, Yiyu Shi et al.
An enormous amount of energy is wasted in Proofof-Work (PoW) mechanisms adopted by popular blockchain applications (e.g., PoW-based cryptocurrencies), because miners must conduct a large amount of computation. Owing to this, one serious rising concern is that the energy waste not only dilutes the value of the blockchain but also hinders its further application. In this paper, we propose a novel blockchain design that fully recycles the energy required for facilitating and maintaining it, which is re-invested to the computation of deep learning. We realize this by proposing Proof-of-Deep-Learning (PoDL) such that a valid proof for a new block can be generated if and only if a proper deep learning model is produced. We present a proof-of-concept design of PoDL that is compatible with the majority of the cryptocurrencies that are based on hash-based PoW mechanisms. Our benchmark and simulation results show that the proposed design is feasible for various popular cryptocurrencies such as Bitcoin, Bitcoin Cash, and Litecoin.
CRNov 30, 2017
VoiceMask: Anonymize and Sanitize Voice Input on Mobile DevicesJianwei Qian, Haohua Du, Jiahui Hou et al.
Voice input has been tremendously improving the user experience of mobile devices by freeing our hands from typing on the small screen. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, the cloud might compromise users' privacy by identifying their identities by voice, learning their sensitive input content via speech recognition, and then profiling the mobile users based on the content. In this paper, we design an intermediate between users and the cloud, named VoiceMask, to sanitize users' voice data before sending it to the cloud for speech recognition. We analyze the potential privacy risks and aim to protect users' identities and sensitive input content from being disclosed to the cloud. VoiceMask adopts a carefully designed voice conversion mechanism that is resistant to several attacks. Meanwhile, it utilizes an evolution-based keyword substitution technique to sanitize the voice input content. The two sanitization phases are all performed in the resource-limited mobile device while still maintaining the usability and accuracy of the cloud-supported speech recognition service. We implement the voice sanitizer on Android systems and present extensive experimental results that validate the effectiveness and efficiency of our app. It is demonstrated that we are able to reduce the chance of a user's voice being identified from 50 people by 84% while keeping the drop of speech recognition accuracy within 14.2%.
CROct 24, 2014
Cloud-based Privacy Preserving Image Storage, Sharing and SearchLan Zhang, Taeho Jung, Puchun Feng et al.
High-resolution cameras produce huge volume of high quality images everyday. It is extremely challenging to store, share and especially search those huge images, for which increasing number of cloud services are presented to support such functionalities. However, images tend to contain rich sensitive information (\eg, people, location and event), and people's privacy concerns hinder their readily participation into the services provided by untrusted third parties. In this work, we introduce PIC: a Privacy-preserving large-scale Image search system on Cloud. Our system enables efficient yet secure content-based image search with fine-grained access control, and it also provides privacy-preserving image storage and sharing among users. Users can specify who can/cannot search on their images when using the system, and they can search on others' images if they satisfy the condition specified by the image owners. Majority of the computationally intensive jobs are outsourced to the cloud side, and users only need to submit the query and receive the result throughout the entire image search. Specially, to deal with massive images, we design our system suitable for distributed and parallel computation and introduce several optimizations to further expedite the search process. We implement a prototype of PIC including both cloud side and client side. The cloud side is a cluster of computers with distributed file system (Hadoop HDFS) and MapReduce architecture (Hadoop MapReduce). The client side is built for both Windows OS laptops and Android phones. We evaluate the prototype system with large sets of real-life photos. Our security analysis and evaluation results show that PIC successfully protect the image privacy at a low cost of computation and communication.
CROct 24, 2014
Outsource Photo Sharing and Searching for Mobile Devices With Privacy ProtectionLan Zhang, Taeho Jung, Cihang Liu et al.
With the proliferation of mobile devices, cloud-based photo sharing and searching services are becoming common due to the mobile devices' resource constrains. Meanwhile, there is also increasing concern about privacy in photos. In this work, we present a framework \ourprotocolNSP, which enables cloud servers to provide privacy-preserving photo sharing and search as a service to mobile device users. Privacy-seeking users can share their photos via our framework to allow only their authorized friends to browse and search their photos using resource-bounded mobile devices. This is achieved by our carefully designed architecture and novel outsourced privacy-preserving computation protocols, through which no information about the outsourced photos or even the search contents (including the results) would be revealed to the cloud servers. Our framework is compatible with most of the existing image search technologies, and it requires few changes to the existing cloud systems. The evaluation of our prototype system with 31,772 real-life images shows the communication and computation efficiency of our system.
CRAug 28, 2013
Enabling Privacy-preserving Auctions in Big DataTaeho Jung, Xiang-Yang Li
We study how to enable auctions in the big data context to solve many upcoming data-based decision problems in the near future. We consider the characteristics of the big data including, but not limited to, velocity, volume, variety, and veracity, and we believe any auction mechanism design in the future should take the following factors into consideration: 1) generality (variety); 2) efficiency and scalability (velocity and volume); 3) truthfulness and verifiability (veracity). In this paper, we propose a privacy-preserving construction for auction mechanism design in the big data, which prevents adversaries from learning unnecessary information except those implied in the valid output of the auction. More specifically, we considered one of the most general form of the auction (to deal with the variety), and greatly improved the the efficiency and scalability by approximating the NP-hard problems and avoiding the design based on garbled circuits (to deal with velocity and volume), and finally prevented stakeholders from lying to each other for their own benefit (to deal with the veracity). We achieve these by introducing a novel privacy-preserving winner determination algorithm and a novel payment mechanism. Additionally, we further employ a blind signature scheme as a building block to let bidders verify the authenticity of their payment reported by the auctioneer. The comparison with peer work shows that we improve the asymptotic performance of peer works' overhead from the exponential growth to a linear growth and from linear growth to a logarithmic growth, which greatly improves the scalability.
CRAug 28, 2013
PDA: Semantically Secure Time-Series Data Analytics with Dynamic SubgroupsTaeho Jung, Junze Han, Xiang-Yang Li
Third-party analysis on private records is becoming increasingly important due to the widespread data collection for various analysis purposes. However, the data in its original form often contains sensitive information about individuals, and its publication will severely breach their privacy. In this paper, we present a novel Privacy-preserving Data Analytics framework PDA, which allows a third-party aggregator to obliviously conduct many different types of polynomial-based analysis on private data records provided by a dynamic sub-group of users. Notably, every user needs to keep only O(n) keys to join data analysis among O(2^n) different groups of users, and any data analysis that is represented by polynomials is supported by our framework. Besides, a real implementation shows the performance of our framework is comparable to the peer works who present ad-hoc solutions for specific data analysis applications. Despite such nice properties of PDA, it is provably secure against a very powerful attacker (chosen-plaintext attack) even in the Dolev-Yao network model where all communication channels are insecure.
CRJul 8, 2013
A General Framework for Privacy-Preserving Distributed Greedy AlgorithmTaeho Jung, Xiang-Yang Li, Lan Zhang
Increasingly more attention is paid to the privacy in online applications due to the widespread data collection for various analysis purposes. Sensitive information might be mined from the raw data during the analysis, and this led to a great privacy concern among people (data providers) these days. To deal with this privacy concerns, multitudes of privacy-preserving computation schemes are proposed to address various computation problems, and we have found many of them fall into a class of problems which can be solved by greedy algorithms. In this paper, we propose a framework for distributed greedy algorithms in which instances in the feasible set come from different parties. By our framework, most generic distributed greedy algorithms can be converted to a privacy preserving one which achieves the same result as the original greedy algorithm while the private information associated with the instances is still protected.
CRAug 1, 2012
Search Me If You Can: Privacy-preserving Location Query ServiceXiang-Yang Li, Taeho Jung
Location-Based Service (LBS) becomes increasingly popular with the dramatic growth of smartphones and social network services (SNS), and its context-rich functionalities attract considerable users. Many LBS providers use users' location information to offer them convenience and useful functions. However, the LBS could greatly breach personal privacy because location itself contains much information. Hence, preserving location privacy while achieving utility from it is still an challenging question now. This paper tackles this non-trivial challenge by designing a suite of novel fine-grained Privacy-preserving Location Query Protocol (PLQP). Our protocol allows different levels of location query on encrypted location information for different users, and it is efficient enough to be applied in mobile platforms.
CRJun 12, 2012
Data Aggregation without Secure Channel: How to Evaluate a Multivariate Polynomial SecurelyTaeho Jung, XuFei Mao, Xiang-Yang Li et al.
Much research has been conducted to securely outsource multiple parties' data aggregation to an untrusted aggregator without disclosing each individual's data, or to enable multiple parties to jointly aggregate their data while preserving privacy. However, those works either assume to have a secure channel or suffer from high complexity. Here we consider how an external aggregator or multiple parties learn some algebraic statistics (e.g., summation, product) over participants' data while any individual's input data is kept secret to others (the aggregator and other participants). We assume channels in our construction are insecure. That is, all channels are subject to eavesdropping attacks, and all the communications throughout the aggregation are open to others. We successfully guarantee data confidentiality under this weak assumption while limiting both the communication and computation complexity to at most linear.
CRJun 12, 2012
AnonyControl: Control Cloud Data Anonymously with Multi-Authority Attribute-Based EncryptionTaeho Jung, Xiang-Yang Li, Zhiguo Wan et al.
Cloud computing is a revolutionary computing paradigm which enables flexible, on-demand and low-cost usage of computing resources. However, those advantages, ironically, are the causes of security and privacy problems, which emerge because the data owned by different users are stored in some cloud servers instead of under their own control. To deal with security problems, various schemes based on the Attribute- Based Encryption (ABE) have been proposed recently. However, the privacy problem of cloud computing is yet to be solved. This paper presents an anonymous privilege control scheme AnonyControl to address the user and data privacy problem in a cloud. By using multiple authorities in cloud computing system, our proposed scheme achieves anonymous cloud data access, finegrained privilege control, and more importantly, tolerance to up to (N -2) authority compromise. Our security and performance analysis show that AnonyControl is both secure and efficient for cloud computing environment.