Matthew Wright

h-index38

20papers

1,596citations

Novelty49%

AI Score45

Ranked #42,534 of 194,257 authors (top 22%)#913 in CR (top 13%)

20 Papers

18.2CRAug 13, 2022Code

On the Limitations of Continual Learning for Malware Classification

Mohammad Saidur Rahman, Scott E. Coull, Matthew Wright

Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware classifier, more than a billion samples can easily accumulate. Given the scale of the problem, sequential training using continual learning techniques could provide substantial benefits in reducing training and storage overhead. To date, however, there has been no exploration of CL applied to malware classification tasks. In this paper, we study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios, including task, class, and domain incremental learning (IL). Specifically, using two realistic, large-scale malware datasets, we evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks. To our surprise, continual learning methods significantly underperformed naive Joint replay of the training data in nearly all settings -- in some cases reducing accuracy by more than 70 percentage points. A simple approach of selectively replaying 20% of the stored data achieves better performance, with 50% of the training time compared to Joint replay. Finally, we discuss potential reasons for the unexpectedly poor performance of the CL techniques, with the hope that it spurs further research on developing techniques that are more effective in the malware classification domain.

5.7CRApr 9

Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection

Nate Mathews, Nicholas Hopper, Matthew Wright

Stepping-stone intrusions (SSIs) are a prevalent network evasion technique in which attackers route sessions through chains of compromised intermediate hosts to obscure their origin. Effective SSI detection requires correlating the incoming and outgoing flows at each relay host at extremely low false positive rates -- a stringent requirement that renders classical statistical methods inadequate in operational settings. We apply ESPRESSO, a deep learning flow correlation model combining a transformer-based feature extraction network, time-aligned multi-channel interval features, and online triplet metric learning, to the problem of stepping-stone intrusion detection. To support training and evaluation, we develop a synthetic data collection tool that generates realistic stepping-stone traffic across five tunneling protocols: SSH, SOCAT, ICMP, DNS, and mixed multi-protocol chains. Across all five protocols and in both host-mode and network-mode detection scenarios, ESPRESSO substantially outperforms the state-of-the-art DeepCoFFEA baseline, achieving a true positive rate exceeding 0.99 at a false positive rate of $10^{-3}$ for standard bursty protocols in network-mode. We further demonstrate chain length prediction as a tool for distinguishing malicious from benign pivoting, and conduct a systematic robustness analysis revealing that timing-based perturbations are the primary vulnerability of correlation-based stepping-stone detectors.

2.6CVNov 1, 2021

Gradient Frequency Modulation for Visually Explaining Video Understanding Models

Xinmiao Lin, Wentao Bao, Matthew Wright et al.

In many applications, it is essential to understand why a machine learning model makes the decisions it does, but this is inhibited by the black-box nature of state-of-the-art neural networks. Because of this, increasing attention has been paid to explainability in deep learning, including in the area of video understanding. Due to the temporal dimension of video data, the main challenge of explaining a video action recognition model is to produce spatiotemporally consistent visual explanations, which has been ignored in the existing literature. In this paper, we propose Frequency-based Extremal Perturbation (F-EP) to explain a video understanding model's decisions. Because the explanations given by perturbation methods are noisy and non-smooth both spatially and temporally, we propose to modulate the frequencies of gradient maps from the neural network model with a Discrete Cosine Transform (DCT). We show in a range of experiments that F-EP provides more spatiotemporally consistent explanations that more faithfully represent the model's decisions compared to the existing state-of-the-art methods.

10.2CRNov 11, 2020Code

Detecting Adversarial Patches with Class Conditional Reconstruction Networks

Perry Deng, Mohammad Saidur Rahman, Matthew Wright

Defending against physical adversarial attacks is a rapidly growing topic in deep learning and computer vision. Prominent forms of physical adversarial attacks, such as overlaid adversarial patches and objects, share similarities with digital attacks, but are easy for humans to notice. This leads us to explore the hypothesis that adversarial detection methods, which have been shown to be ineffective against adaptive digital adversarial examples, can be effective against these physical attacks. We use one such detection method based on autoencoder architectures, and perform adversarial patching experiments on MNIST, SVHN, and CIFAR10 against a CNN architecture and two CapsNet architectures. We also propose two modifications to the EM-Routed CapsNet architecture, Affine Voting and Matrix Capsule Dropout, to improve its classification performance. Our investigation shows that the detector retains some of its effectiveness even against adaptive adversarial patch attacks. In addition, detection performance tends to decrease among all the architectures with the increase of dataset complexity.

18.2HCOct 4, 2020

Problems and Prospects for Intimate Musical Control of Computers

David Wessel, Matthew Wright

In this paper we describe our efforts towards the development of live performance computer-based musical instrumentation. Our design criteria include initial ease of use coupled with a long term potential for virtuosity, minimal and low variance latency, and clear and simple strategies for programming the relationship between gesture and musical result. We present custom controllers and unique adaptations of standard gestural interfaces, a programmable connectivity processor, a communications protocol called Open Sound Control (OSC), and a variety of metaphors for musical control. We further describe applications of our technology to a variety of real musical performances and directions for future research.

6.6LGSep 16, 2019

They Might NOT Be Giants: Crafting Black-Box Adversarial Examples with Fewer Queries Using Particle Swarm Optimization

Rayan Mosli, Matthew Wright, Bo Yuan et al.

Machine learning models have been found to be susceptible to adversarial examples that are often indistinguishable from the original inputs. These adversarial examples are created by applying adversarial perturbations to input samples, which would cause them to be misclassified by the target models. Attacks that search and apply the perturbations to create adversarial examples are performed in both white-box and black-box settings, depending on the information available to the attacker about the target. For black-box attacks, the only capability available to the attacker is the ability to query the target with specially crafted inputs and observing the labels returned by the model. Current black-box attacks either have low success rates, requires a high number of queries, or produce adversarial examples that are easily distinguishable from their sources. In this paper, we present AdversarialPSO, a black-box attack that uses fewer queries to create adversarial examples with high success rates. AdversarialPSO is based on the evolutionary search algorithm Particle Swarm Optimization, a populationbased gradient-free optimization algorithm. It is flexible in balancing the number of queries submitted to the target vs the quality of imperceptible adversarial examples. The attack has been evaluated using the image classification benchmark datasets CIFAR-10, MNIST, and Imagenet, achieving success rates of 99.6%, 96.3%, and 82.0%, respectively, while submitting substantially fewer queries than the state-of-the-art. We also present a black-box method for isolating salient features used by models when making classifications. This method, called Swarms with Individual Search Spaces or SWISS, creates adversarial examples by finding and modifying the most important features in the input.

20.0CRFeb 18, 2019Code

Mockingbird: Defending Against Deep-Learning-Based Website Fingerprinting Attacks with Adversarial Traces

Mohammad Saidur Rahman, Mohsen Imani, Nate Mathews et al.

Website Fingerprinting (WF) is a type of traffic analysis attack that enables a local passive eavesdropper to infer the victim's activity, even when the traffic is protected by a VPN or an anonymity system like Tor. Leveraging a deep-learning classifier, a WF attacker can gain over 98% accuracy on Tor traffic. In this paper, we explore a novel defense, Mockingbird, based on the idea of adversarial examples that have been shown to undermine machine-learning classifiers in other domains. Since the attacker gets to design and train his attack classifier based on the defense, we first demonstrate that at a straightforward technique for generating adversarial-example based traces fails to protect against an attacker using adversarial training for robust classification. We then propose Mockingbird, a technique for generating traces that resists adversarial training by moving randomly in the space of viable traces and not following more predictable gradients. The technique drops the accuracy of the state-of-the-art attack hardened with adversarial training from 98% to 42-58% while incurring only 58% bandwidth overhead. The attack accuracy is generally lower than state-of-the-art defenses, and much lower when considering Top-2 accuracy, while incurring lower bandwidth overheads.

12.4CRMay 5, 2018

Towards Predicting Efficient and Anonymous Tor Circuits

Armon Barton, Mohsen Imani, Jiang Ming et al.

The Tor anonymity system provides online privacy for millions of users, but it is slower than typical web browsing. To improve Tor performance, we propose PredicTor, a path selection technique that uses a Random Forest classifier trained on recent measurements of Tor to predict the performance of a proposed path. If the path is predicted to be fast, then the client builds a circuit using those relays. We implemented PredicTor in the Tor source code and show through live Tor experiments and Shadow simulations that PredicTor improves Tor network performance by 11% to 23% compared to Vanilla Tor and by 7% to 13% compared to the previous state-of-the-art scheme. Our experiments show that PredicTor is the first path selection algorithm to dynamically avoid highly congested nodes during times of high congestion and avoid long-distance paths during times of low congestion. We evaluate the anonymity of PredicTor using standard entropy-based and time-to-first-compromise metrics, but these cannot capture the possibility of leakage due to the use of location in path selection. To better address this, we propose a new anonymity metric called CLASI: Client Autonomous System Inference. CLASI is the first anonymity metric in Tor that measures an adversary's ability to infer client Autonomous Systems (ASes) by fingerprinting circuits at the network, country, and relay level. We find that CLASI shows anonymity loss for location-aware path selection algorithms, where entropy-based metrics show little to no loss of anonymity. Additionally, CLASI indicates that PredicTor has similar sender AS leakage compared to the current Tor path selection algorithm due to PredicTor building circuits that are independent of client location.

29.7CRJan 7, 2018Code

Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning

Payap Sirinam, Mohsen Imani, Marc Juarez et al.

Website fingerprinting enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art website fingerprinting attacks have been shown to be effective even against Tor. Recently, lightweight website fingerprinting defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we present Deep Fingerprinting (DF), a new website fingerprinting attack against Tor that leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, better than all prior attacks, and it is also the only attack that is effective against WTF-PAD with over 90% accuracy. Walkie-Talkie remains effective, holding the attack to just 49.7% accuracy. In the more realistic open-world setting, our attack remains effective, with 0.99 precision and 0.94 recall on undefended traffic. Against traffic defended with WTF-PAD in this setting, the attack still can get 0.96 precision and 0.68 recall. These findings highlight the need for effective defenses that protect against this new attack and that could be deployed in Tor.

2.5CRJun 17, 2017

The Evaluation of Circuit Selection Methods on Tor

Mohsen Imani, Mehrdad Amirabadi, Matthew Wright

Tor provides anonymity online by routing traffic through encrypted tunnels, called circuits, over paths of anonymizing relays. To enable users to connect to their selected destination servers without waiting for the circuit to be build, the Tor client maintains a few circuits at all times. Nevertheless, Tor is slower to use than directly connecting to the destination server. In this paper, we propose to have the Tor client measure the performance of the pre-built circuits and select the fastest circuits for users to send their traffic over. To this end, we define and evaluate nine metrics for selecting which pre-built circuit to use based on different combinations of circuit length, Round Trip Time (RTT), and congestion. We also explore the effect on performance of the number of pre-built circuits at the time of the selection. Through whole-network experiments in Shadow, we show that using circuit RTT with at least three pre-built circuits allows the Tor client to identify fast circuits and improves median time to first byte (TTFB) by 22% over Tor and 15% over congestion-aware routing, the state-of-the-art in Tor circuit selection. We evaluate the security of the proposed circuit selection mechanism against both a relay-level and a network-level adversary and find no loss of security compared with Tor.

2.5CRJun 17, 2017

Forming Guard Sets using AS Relationships

Mohsen Imani, Armon Barton, Matthew Wright

The mechanism for picking guards in Tor suffers from security problems like guard fingerprinting and from performance issues. To address these issues, Hayes and Danezis proposed the use of guard sets, in which the Tor system groups all guards into sets, and each client picks one of these sets and uses its guards. Unfortunately, guard sets frequently need nodes added or they are broken up due to fluctuations in network bandwidth. In this paper, we first show that these breakups create opportunities for malicious guards to join many guard sets by merely tuning the bandwidth they make available to Tor, and this greatly increases the number of clients exposed to malicious guards. To address this problem, we propose a new method for forming guard sets based on Internet location. We construct a hierarchy that keeps clients and guards together more reliably and prevents guards from easily joining arbitrary guard sets. This approach also has the advantage of confining an attacker with access to limited locations on the Internet to a small number of guard sets. We simulate this guard set design using historical Tor data in the presence of both relay-level adversaries and network-level adversaries, and we find that our approach is good at confining the adversary into few guard sets and thus limiting the impact of attacks.

7.5CRAug 26, 2016

Modified Relay Selection and Circuit Selection for Faster Tor

Mohsen Imani, Mehrdad Amirabadi, Matthew Wright

Users of the Tor anonymity system suffer from lessthan- ideal performance, in part because circuit building and selection processes are not tuned for speed. In this paper, we examine both the process of selecting among pre-built circuits and the process of selecting the path of relays for use in building new circuits to improve performance while maintaining anonymity. First, we show that having three pre-built circuits available allows the Tor client to identify fast circuits and improves median time to first byte (TTFB) by 15% over congestion-aware routing, the current state-of-the-art method. Second, we propose a new path selection algorithm that includes broad geographic location information together with bandwidth to reduce delays. In Shadow simulations, we find 20% faster median TTFB and 11% faster median total download times over congestion-aware routing for accessing webpage-sized objects. Our security evaluations show that this approach leads to better or equal security against a generic relay-level adversary compared to Tor, but increased vulnerability to targeted attacks. We explore this trade-off and find settings of our system that offer good performance, modestly better security against a generic adversary, and only slightly more vulnerability to a targeted adversary.

27.3CRDec 2, 2015

Toward an Efficient Website Fingerprinting Defense

Marc Juarez, Mohsen Imani, Mike Perry et al.

Website Fingerprinting attacks enable a passive eavesdropper to recover the user's otherwise anonymized web browsing activity by matching the observed traffic with prerecorded web traffic templates. The defenses that have been proposed to counter these attacks are impractical for deployment in real-world systems due to their high cost in terms of added delay and bandwidth overhead. Further, these defenses have been designed to counter attacks that, despite their high success rates, have been criticized for assuming unrealistic attack conditions in the evaluation setting. In this paper, we propose a novel, lightweight defense based on Adaptive Padding that provides a sufficient level of security against website fingerprinting, particularly in realistic evaluation conditions. In a closed-world setting, this defense reduces the accuracy of the state-of-the-art attack from 91% to 20%, while introducing zero latency overhead and less than 60% bandwidth overhead. In an open-world, the attack precision is just 1% and drops further as the number of sites grows.

7.8HCMar 8, 2015

Towards Making Random Passwords Memorable: Leveraging Users' Cognitive Ability Through Multiple Cues

Mahdi Nasrullah Al-Ameen, Matthew Wright, Shannon Scielzo

Given the choice, users produce passwords reflecting common strategies and patterns that ease recall but offer uncertain and often weak security. System-assigned passwords provide measurable security but suffer from poor memorability. To address this usability-security tension, we argue that systems should assign random passwords but also help with memorization and recall. We investigate the feasibility of this approach with CuedR, a novel cued-recognition authentication scheme that provides users with multiple cues (visual, verbal, and spatial) and lets them choose the cues that best fit their learning process for later recognition of system-assigned keywords. In our lab study, all 37 of our participants could log in within three attempts one week after registration (mean login time: 38.0 seconds). A pilot study on using multiple CuedR passwords also showed 100% recall within three attempts. Based on our results, we suggest appropriate applications for CuedR, such as financial and e-commerce accounts.

3.7CRDec 22, 2014

iPersea : The Improved Persea with Sybil Detection Mechanism

Mahdi Nasrullah Al-Ameen, Matthew Wright

P2P systems are highly susceptible to Sybil attacks, in which an attacker creates a large number of identities and uses them to control a substantial fraction of the system. Persea is the most recent approach towards designing a social network based Sybil-resistant DHT. Unlike prior Sybil-resistant P2P systems based on social networks, Persea does not rely on two key assumptions: (i) that the social network is fast mixing, and (ii) that there is a small ratio of attack edges to honest peers. Both assumptions have been shown to be unreliable in real social networks. The hierarchical distribution of node IDs in Persea confines a large attacker botnet to a considerably smaller region of the ID space than in a normal P2P system and its replication mechanism lets a peer to retrieve the desired results even if a given region is occupied by attackers. However, Persea system suffers from certain limitations, since it cannot handle the scenario, where the malicious target returns an incorrect result instead of just ignoring the lookup request. In this paper, we address this major limitation of Persea through a Sybil detection mechanism built on top of Persea system, which accommodates inspection lookup, a specially designed lookup scheme to detect the Sybil nodes based on their responses to the lookup query. We design a scheme to filter those detected Sybils to ensure the participation of honest nodes on the lookup path during regular DHT lookup. Since the malicious nodes are opt-out from the lookup path in our system, they cannot return any incorrect result during regular lookup. We evaluate our system in simulations with social network datasets and the results show that catster, the largest network in our simulation with 149700 nodes and 5449275 edges, gains 100% lookup success rate, even when the number of attack edges is equal to the number of benign peers in the network.

8.4HCAug 12, 2014

A Comprehensive Study of the GeoPass User Authentication Scheme

Mahdi Nasrullah Al-Ameen, Matthew Wright

Before deploying a new user authentication scheme, it is critical to subject the scheme to comprehensive study. Few works, however, have undertaken such a study. Recently, Thorpe et al. proposed GeoPass, the most promising of a class of user authentication schemes based on geographic locations in online maps. Their study showed very high memorability (97%) and satisfactory resilience against online guessing, which means that GeoPass has compelling features for real-world use. No comprehensive study, however, has been conducted for GeoPass or any other location-based password scheme. In this paper, we present a systematic approach for the detailed evaluation of a password system, which we implement to study GeoPass. We conducted three separate studies to evaluate the suitability of GeoPass for widespread use. First, we performed a field study over two months, in which users in a real-world setting remembered their location-passwords 96% of the time and showed improvement with more login sessions. Second, we conducted a study to test how users would fare with multiple location-passwords and found that users remembered their location-passwords in less than 70% of login sessions, with 40% of login failures due to interference effects. Third, we conducted a study to examine the resilience of GeoPass against shoulder surfing. Our participants played the role of attackers and had an overall success rate of 48%. Based on our results, we suggest suitable applications of GeoPass in its current state and identify aspects of GeoPass that must be improved before widespread deployment could be considered.

3.5HCJul 27, 2014

Q-A: Towards the Solution of Usability-Security Tension in User Authentication

Mahdi Nasrullah Al-Ameen, S M Taiabul Haque, Matthew Wright

Users often choose passwords that are easy to remember but also easy to guess by attackers. Recent studies have revealed the vulnerability of textual passwords to shoulder surfing and keystroke loggers. It remains a critical challenge in password research to develop an authentication scheme that addresses these security issues, in addition to offering good memorability. Motivated by psychology research on humans' cognitive strengths and weaknesses, we explore the potential of cognitive questions as a way to address the major challenges in user authentication. We design, implement, and evaluate Q-A, a novel cognitive-question-based password system that requires a user to enter the letter at a given position in her answer for each of six personal questions (e.g. "What is the name of your favorite childhood teacher?"). In this scheme, the user does not need to memorize new, artificial information as her authentication secret. Our scheme offers 28 bits of theoretical password space, which has been found sufficient to prevent online brute-force attacks. Q-A is also robust against shoulder surfing and keystroke loggers. We conducted a multi-session in-lab user study to evaluate the usability of Q-A; 100% of users were able to remember their Q-A password over the span of one week, although login times were high. We compared our scheme with random six character passwords and found that login success rate in Q-A was significantly higher. Based on our results, we suggest that Q-A would be most appropriate in contexts that demand high security and where logins occur infrequently (e.g., online bank accounts).

14.7CRMay 2, 2014

Dovetail: Stronger Anonymity in Next-Generation Internet Routing

Jody Sankey, Matthew Wright

Current low-latency anonymity systems use complex overlay networks to conceal a user's IP address, introducing significant latency and network efficiency penalties compared to normal Internet usage. Rather than obfuscating network identity through higher level protocols, we propose a more direct solution: a routing protocol that allows communication without exposing network identity, providing a strong foundation for Internet privacy, while allowing identity to be defined in those higher level protocols where it adds value. Given current research initiatives advocating "clean slate" Internet designs, an opportunity exists to design an internetwork layer routing protocol that decouples identity from network location and thereby simplifies the anonymity problem. Recently, Hsiao et al. proposed such a protocol (LAP), but it does not protect the user against a local eavesdropper or an untrusted ISP, which will not be acceptable for many users. Thus, we propose Dovetail, a next-generation Internet routing protocol that provides anonymity against an active attacker located at any single point within the network, including the user's ISP. A major design challenge is to provide this protection without including an application-layer proxy in data transmission. We address this challenge in path construction by using a matchmaker node (an end host) to overlap two path segments at a dovetail node (a router). The dovetail then trims away part of the path so that data transmission bypasses the matchmaker. Additional design features include the choice of many different paths through the network and the joining of path segments without requiring a trusted third party. We develop a systematic mechanism to measure the topological anonymity of our designs, and we demonstrate the privacy and efficiency of our proposal by simulation, using a model of the complete Internet at the AS-level.

1.2NISep 21, 2012

ReDS: A Framework for Reputation-Enhanced DHTs

Ruj Akavipat, Mahdi N. Al-Ameen, Apu Kapadia et al.

Distributed Hash Tables (DHTs) such as Chord and Kademlia offer an efficient solution for locating resources in peer-to-peer networks. Unfortunately, malicious nodes along a lookup path can easily subvert such queries. Several systems, including Halo (based on Chord) and Kad (based on Kademlia), mitigate such attacks by using a combination of redundancy and diversity in the paths taken by redundant lookup queries. Much greater assurance can be provided, however. We describe Reputation for Directory Services (ReDS), a framework for enhancing lookups in redundant DHTs by tracking how well other nodes service lookup requests. We describe how the ReDS technique can be applied to virtually any redundant DHT including Halo and Kad. We also study the collaborative identification and removal of bad lookup paths in a way that does not rely on the sharing of reputation scores --- we show that such sharing is vulnerable to attacks that make it unsuitable for most applications of ReDS. Through extensive simulations we demonstrate that ReDS improves lookup success rates for Halo and Kad by 80% or more over a wide range of conditions, even against strategic attackers attempting to game their reputation scores and in the presence of node churn.

11.5CRAug 30, 2012

Pisces: Anonymous Communication Using Social Networks

Prateek Mittal, Matthew Wright, Nikita Borisov

The architectures of deployed anonymity systems such as Tor suffer from two key problems that limit user's trust in these systems. First, paths for anonymous communication are built without considering trust relationships between users and relays in the system. Second, the network architecture relies on a set of centralized servers. In this paper, we propose Pisces, a decentralized protocol for anonymous communications that leverages users' social links to build circuits for onion routing. We argue that such an approach greatly improves the system's resilience to attackers. A fundamental challenge in this setting is the design of a secure process to discover peers for use in a user's circuit. All existing solutions for secure peer discovery leverage structured topologies and cannot be applied to unstructured social network topologies. In Pisces, we discover peers by using random walks in the social network graph with a bias away from highly connected nodes to prevent a few nodes from dominating the circuit creation process. To secure the random walks, we leverage the reciprocal neighbor policy: if malicious nodes try to exclude honest nodes during peer discovery so as to improve the chance of being selected, then honest nodes can use a tit-for-tat approach and reciprocally exclude the malicious nodes from their routing tables. We describe a fully decentralized protocol for enforcing this policy, and use it to build the Pisces anonymity system. Using theoretical modeling and experiments on real-world social network topologies, we show that (a) the reciprocal neighbor policy mitigates active attacks that an adversary can perform, (b) our decentralized protocol to enforce this policy is secure and has low overhead, and (c) the overall anonymity provided by our system significantly outperforms existing approaches.