CVApr 7, 2022
Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision SettingsYuhao Mao, Chong Fu, Saizhuo Wang et al.
One intriguing property of adversarial attacks is their "transferability" -- an adversarial example crafted with respect to one deep neural network (DNN) model is often found effective against other DNNs as well. Intensive research has been conducted on this phenomenon under simplistic controlled conditions. Yet, thus far, there is still a lack of comprehensive understanding about transferability-based attacks ("transfer attacks") in real-world environments. To bridge this critical gap, we conduct the first large-scale systematic empirical study of transfer attacks against major cloud-based MLaaS platforms, taking the components of a real transfer attack into account. The study leads to a number of interesting findings which are inconsistent to the existing ones, including: (1) Simple surrogates do not necessarily improve real transfer attacks. (2) No dominant surrogate architecture is found in real transfer attacks. (3) It is the gap between posterior (output of the softmax layer) rather than the gap between logit (so-called $κ$ value) that increases transferability. Moreover, by comparing with prior works, we demonstrate that transfer attacks possess many previously unknown properties in real-world environments, such as (1) Model similarity is not a well-defined concept. (2) $L_2$ norm of perturbation can generate high transferability without usage of gradient and is a more powerful source than $L_\infty$ norm. We believe this work sheds light on the vulnerabilities of popular MLaaS platforms and points to a few promising research directions.
AIOct 24, 2020
Deep Graph Matching and Searching for Semantic Code RetrievalXiang Ling, Lingfei Wu, Saizhuo Wang et al.
Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this paper, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve the best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public code retrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.
CRAug 20, 2020
When Homomorphic Encryption Marries Secret Sharing: Secure Large-Scale Sparse Logistic Regression and Applications in Risk ControlChaochao Chen, Jun Zhou, Li Wang et al.
Logistic Regression (LR) is the most widely used machine learning model in industry for its efficiency, robustness, and interpretability. Due to the problem of data isolation and the requirement of high model performance, many applications in industry call for building a secure and efficient LR model for multiple parties. Most existing work uses either Homomorphic Encryption (HE) or Secret Sharing (SS) to build secure LR. HE based methods can deal with high-dimensional sparse features, but they incur potential security risks. SS based methods have provable security, but they have efficiency issue under high-dimensional sparse features. In this paper, we first present CAESAR, which combines HE and SS to build secure large-scale sparse logistic regression model and achieves both efficiency and security. We then present the distributed implementation of CAESAR for scalability requirement. We have deployed CAESAR in a risk control task and conducted comprehensive experiments. Our experimental results show that CAESAR improves the state-of-the-art model by around 130 times.
LGJul 8, 2020
Multilevel Graph Matching Networks for Deep Graph Similarity LearningXiang Ling, Lingfei Wu, Saizhuo Wang et al.
While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., between each node of one graph and the other whole graph). In this paper, we propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects in an end-to-end fashion. In particular, the proposed MGMN consists of a node-graph matching network for effectively learning cross-level interactions between each node of one graph and the other whole graph, and a siamese graph neural network to learn global-level interactions between two input graphs. Furthermore, to compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks with different sizes in order to evaluate the effectiveness and robustness of our models. Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks. Compared with previous work, MGMN also exhibits stronger robustness as the sizes of the two input graphs increase.
SPJul 8, 2020
Fine-grained Vibration Based Sensing Using a SmartphoneKamran Ali, Alex X. Liu
Recognizing surfaces based on their vibration signatures is useful as it can enable tagging of different locations without requiring any additional hardware such as Near Field Communication (NFC) tags. However, previous vibration based surface recognition schemes either use custom hardware for creating and sensing vibration, which makes them difficult to adopt, or use inertial (IMU) sensors in commercial off-the-shelf (COTS) smartphones to sense movements produced due to vibrations, which makes them coarse-grained because of the low sampling rates of IMU sensors. The mainstream COTS smartphones based schemes are also susceptible to inherent hardware based irregularities in vibration mechanism of the smartphones. Moreover, the existing schemes that use microphones to sense vibration are prone to short-term and constant background noises (e.g. intermittent talking, exhaust fan, etc.) because microphones not only capture the sounds created by vibration but also other interfering sounds present in the environment. In this paper, we propose VibroTag, a robust and practical vibration based sensing scheme that works with smartphones with different hardware, can extract fine-grained vibration signatures of different surfaces, and is robust to environmental noise and hardware based irregularities. We implemented VibroTag on two different Android phones and evaluated in multiple different environments where we collected data from 4 individuals for 5 to 20 consecutive days. Our results show that VibroTag achieves an average accuracy of 86.55% while recognizing 24 different locations/surfaces, even when some of those surfaces were made of similar material. VibroTag's accuracy is 37% higher than the average accuracy of 49.25% achieved by one of the state-of-the-art IMUs based schemes, which we implemented for comparison with VibroTag.
SPJul 7, 2020
Monitoring Browsing Behavior of Customers in Retail Stores via RFID ImagingKamran Ali, Alex X. Liu, Eugene Chai et al.
In this paper, we propose to use commercial off-the-shelf (COTS) monostatic RFID devices (i.e. which use a single antenna at a time for both transmitting and receiving RFID signals to and from the tags) to monitor browsing activity of customers in front of display items in places such as retail stores. To this end, we propose TagSee, a multi-person imaging system based on monostatic RFID imaging. TagSee is based on the insight that when customers are browsing the items on a shelf, they stand between the tags deployed along the boundaries of the shelf and the reader, which changes the multi-paths that the RFID signals travel along, and both the RSS and phase values of the RFID signals that the reader receives change. Based on these variations observed by the reader, TagSee constructs a coarse grained image of the customers. Afterwards, TagSee identifies the items that are being browsed by the customers by analyzing the constructed images. The key novelty of this paper is on achieving browsing behavior monitoring of multiple customers in front of display items by constructing coarse grained images via robust, analytical model-driven deep learning based, RFID imaging. To achieve this, we first mathematically formulate the problem of imaging humans using monostatic RFID devices and derive an approximate analytical imaging model that correlates the variations caused by human obstructions in the RFID signals. Based on this model, we then develop a deep learning framework to robustly image customers with high accuracy. We implement TagSee scheme using a Impinj Speedway R420 reader and SMARTRAC DogBone RFID tags. TagSee can achieve a TPR of more than ~90% and a FPR of less than ~10% in multi-person scenarios using training data from just 3-4 users.
CRDec 19, 2019
An Adaptive and Fast Convergent Approach to Differentially Private Deep LearningZhiying Xu, Shuyu Shi, Alex X. Liu et al.
With the advent of the era of big data, deep learning has become a prevalent building block in a variety of machine learning or data mining tasks, such as signal processing, network modeling and traffic analysis, to name a few. The massive user data crowdsourced plays a crucial role in the success of deep learning models. However, it has been shown that user data may be inferred from trained neural models and thereby exposed to potential adversaries, which raises information security and privacy concerns. To address this issue, recent studies leverage the technique of differential privacy to design private-preserving deep learning algorithms. Albeit successful at privacy protection, differential privacy degrades the performance of neural models. In this paper, we develop ADADP, an adaptive and fast convergent learning algorithm with a provable privacy guarantee. ADADP significantly reduces the privacy cost by improving the convergence speed with an adaptive learning rate and mitigates the negative effect of differential privacy upon the model accuracy by introducing adaptive noise. The performance of ADADP is evaluated on real-world datasets. Experiment results show that it outperforms state-of-the-art differentially private approaches in terms of both privacy cost and model accuracy.
CRDec 9, 2017
The Insecurity of Home Digital Voice Assistants -- Amazon Alexa as a Case StudyXinyu Lei, Guan-Hua Tu, Alex X. Liu et al.
Home Digital Voice Assistants (HDVAs) are getting popular in recent years. Users can control smart devices and get living assistance through those HDVAs (e.g., Amazon Alexa, Google Home) using voice. In this work, we study the insecurity of HDVA service by using Amazon Alexa as a case study. We disclose three security vulnerabilities which root in the insecure access control of Alexa services. We then exploit them to devise two proof-of-concept attacks, home burglary and fake order, where the adversary can remotely command the victim's Alexa device to open a door or place an order from Amazon.com. The insecure access control is that the Alexa device not only relies on a single-factor authentication but also takes voice commands even if no people are around. We thus argue that HDVAs should have another authentication factor, a physical presence based access control; that is, they can accept voice commands only when any person is detected nearby. To this end, we devise a Virtual Security Button (VSButton), which leverages the WiFi technology to detect indoor human motions. Once any indoor human motion is detected, the Alexa device is enabled to accept voice commands. Our evaluation results show that it can effectively differentiate indoor motions from the cases of no motion and outdoor motions in both the laboratory and real world settings.
CRJul 1, 2013
A Random Matrix Approach to Differential Privacy and Structure Preserved Social Network Graph PublishingFaraz Ahmed, Rong Jin, Alex X. Liu
Online social networks are being increasingly used for analyzing various societal phenomena such as epidemiology, information dissemination, marketing and sentiment flow. Popular analysis techniques such as clustering and influential node analysis, require the computation of eigenvectors of the real graph's adjacency matrix. Recent de-anonymization attacks on Netflix and AOL datasets show that an open access to such graphs pose privacy threats. Among the various privacy preserving models, Differential privacy provides the strongest privacy guarantees. In this paper we propose a privacy preserving mechanism for publishing social network graph data, which satisfies differential privacy guarantees by utilizing a combination of theory of random matrix and that of differential privacy. The key idea is to project each row of an adjacency matrix to a low dimensional space using the random projection approach and then perturb the projected matrix with random noise. We show that as compared to existing approaches for differential private approximation of eigenvectors, our approach is computationally efficient, preserves the utility and satisfies differential privacy. We evaluate our approach on social network graphs of Facebook, Live Journal and Pokec. The results show that even for high values of noise variance sigma=1 the clustering quality given by normalized mutual information gain is as low as 0.74. For influential node discovery, the propose approach is able to correctly recover 80 of the most influential nodes. We also compare our results with an approach presented in [43], which directly perturbs the eigenvector of the original data by a Laplacian noise. The results show that this approach requires a large random perturbation in order to preserve the differential privacy, which leads to a poor estimation of eigenvectors for large social networks.