CRAug 13, 2019
Exploit Prediction Scoring System (EPSS)Jay Jacobs, Sasha Romanosky, Benjamin Edwards et al.
Despite the massive investments in information security technologies and research over the past decades, the information security industry is still immature. In particular, the prioritization of remediation efforts within vulnerability management programs predominantly relies on a mixture of subjective expert opinion, severity scores, and incomplete data. Compounding the need for prioritization is the increase in the number of vulnerabilities the average enterprise has to remediate. This paper produces the first open, data-driven framework for assessing vulnerability threat, that is, the probability that a vulnerability will be exploited in the wild within the first twelve months after public disclosure. This scoring system has been designed to be simple enough to be implemented by practitioners without specialized tools or software, yet provides accurate estimates of exploitation. Moreover, the implementation is flexible enough that it can be updated as more, and better, data becomes available. We call this system the Exploit Prediction Scoring System, EPSS.
CRApr 24, 2019
Risky Business: Assessing Security with External MeasurementsBenjamin Edwards, Jay Jacobs, Stephanie Forrest
Security practices in large organizations are notoriously difficult to assess. The challenge only increases when organizations turn to third parties to provide technology and business services, which typically require tight network integration and sharing of confidential data, potentially increasing the organization's attack surface. The security maturity of an organization describes how well it mitigates known risks and responds to new threats. Today, maturity is typically assessed with audits and questionnaires, which are difficult to quantify, lack objectivity, and may not reflect current threats. This paper demonstrates how external measurement of an organization can be used to assess the relative quality of security among organizations. Using a large dataset from BitSight(www.bitsight.com), a cybersecurity ratings company, containing 3.2 billion measurements spanning nearly 37,000 organizations collected during calendar year 2015, we show how per-organizational "risk vectors" can be constructed that may be related to an organization's overall security posture, or maturity. Using statistical analysis, we then study the correlation between the risk vectors and botnet infections. For example, we find that misconfigured TLS services, publicly available unsecured protocols, and the use of peer-to-peer file sharing correlate with organizations that have increased rates of botnet infections. We argue that the methodology used to identify these correlations can easily be applied to other data to provide a growing picture of organizational security using external measurement.
LGNov 9, 2018
Detecting Backdoor Attacks on Deep Neural Networks by Activation ClusteringBryant Chen, Wilka Carvalho, Nathalie Baracaldo et al.
While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdoors or trojans into the model, enabling malicious behavior with simple external backdoor triggers at inference time and only a blackbox perspective of the model itself. Detecting this type of attack is challenging because the unexpected behavior occurs only when a backdoor trigger, which is known only to the adversary, is present. Model users, either direct users of training data or users of pre-trained model from a catalog, may not guarantee the safe operation of their ML-based system. In this paper, we propose a novel approach to backdoor detection and removal for neural networks. Through extensive experimental results, we demonstrate its effectiveness for neural networks classifying text and images. To the best of our knowledge, this is the first methodology capable of detecting poisonous data crafted to insert backdoors and repairing the model that does not require a verified and trusted dataset.
LGMay 31, 2018
Defending Against Machine Learning Model Stealing Attacks Using Deceptive PerturbationsTaesung Lee, Benjamin Edwards, Ian Molloy et al.
Machine learning models are vulnerable to simple model stealing attacks if the adversary can obtain output labels for chosen inputs. To protect against these attacks, it has been proposed to limit the information provided to the adversary by omitting probability scores, significantly impacting the utility of the provided service. In this work, we illustrate how a service provider can still provide useful, albeit misleading, class probability information, while significantly limiting the success of the attack. Our defense forces the adversary to discard the class probabilities, requiring significantly more queries before they can train a model with comparable performance. We evaluate several attack strategies, model architectures, and hyperparameters under varying adversarial models, and evaluate the efficacy of our defense against the strongest adversary. Finally, we quantify the amount of noise injected into the class probabilities to mesure the loss in utility, e.g., adding 1.26 nats per query on CIFAR-10 and 3.27 on MNIST. Our evaluation shows our defense can degrade the accuracy of the stolen model at least 20%, or require up to 64 times more queries while keeping the accuracy of the protected model almost intact.
LGAug 18, 2015
Supervised learning of sparse context reconstruction coefficients for data representation and classificationXuejie Liu, Jingbin Wang, Ming Yin et al.
Context of data points, which is usually defined as the other data points in a data set, has been found to play important roles in data representation and classification. In this paper, we study the problem of using context of a data point for its classification problem. Our work is inspired by the observation that actually only very few data points are critical in the context of a data point for its representation and classification. We propose to represent a data point as the sparse linear combination of its context, and learn the sparse context in a supervised way to increase its discriminative ability. To this end, we proposed a novel formulation for context learning, by modeling the learning of context parameter and classifier in a unified objective, and optimizing it with an alternative strategy in an iterative algorithm. Experiments on three benchmark data set show its advantage over state-of-the-art context-based data representation and classification methods.
CVJun 30, 2015
Representing data by sparse combination of contextual data points for classificationJingyan Wang, Yihua Zhou, Ming Yin et al.
In this paper, we study the problem of using contextual da- ta points of a data point for its classification problem. We propose to represent a data point as the sparse linear reconstruction of its context, and learn the sparse context to gather with a linear classifier in a su- pervised way to increase its discriminative ability. We proposed a novel formulation for context learning, by modeling the learning of context reconstruction coefficients and classifier in a unified objective. In this objective, the reconstruction error is minimized and the coefficient spar- sity is encouraged. Moreover, the hinge loss of the classifier is minimized and the complexity of the classifier is reduced. This objective is opti- mized by an alternative strategy in an iterative algorithm. Experiments on three benchmark data set show its advantage over state-of-the-art context-based data representation and classification methods.
NIFeb 17, 2012
Modeling Internet-Scale Policies for Cleaning up MalwareSteven Hofmeyr, Tyler Moore, Stephanie Forrest et al.
An emerging consensus among policy makers is that interventions undertaken by Internet Service Providers are the best way to counter the rising incidence of malware. However, assessing the suitability of countermeasures at this scale is hard. In this paper, we use an agent-based model, called ASIM, to investigate the impact of policy interventions at the Autonomous System level of the Internet. For instance, we find that coordinated intervention by the 0.2%-biggest ASes is more effective than uncoordinated efforts adopted by 30% of all ASes. Furthermore, countermeasures that block malicious transit traffic appear more effective than ones that block outgoing traffic. The model allows us to quantify and compare positive externalities created by different countermeasures. Our results give an initial indication of the types and levels of intervention that are most cost-effective at large scale.
CRFeb 17, 2012
Beyond the Blacklist: Modeling Malware Spread and the Effect of InterventionsBenjamin Edwards, Tyler Moore, George Stelle et al.
Malware spread among websites and between websites and clients is an increasing problem. Search engines play an important role in directing users to websites and are a natural control point for intervening, using mechanisms such as blacklisting. The paper presents a simple Markov model of malware spread through large populations of websites and studies the effect of two interventions that might be deployed by a search provider: blacklisting infected web pages by removing them from search results entirely and a generalization of blacklisting, called depreferencing, in which a website's ranking is decreased by a fixed percentage each time period the site remains infected. We analyze and study the trade-offs between infection exposure and traffic loss due to false positives (the cost to a website that is incorrectly blacklisted) for different interventions. As expected, we find that interventions are most effective when websites are slow to remove infections. Surprisingly, we also find that low infection or recovery rates can increase traffic loss due to false positives. Our analysis also shows that heavy-tailed distributions of website popularity, as documented in many studies, leads to high sample variance of all measured outcomes. These result implies that it will be difficult to determine empirically whether certain website interventions are effective, and it suggests that theoretical models such as the one described in this paper have an important role to play in improving web security.