Matthew Roughan

CR
7papers
195citations
Novelty41%
AI Score24

7 Papers

CRApr 12, 2018Code
Clear as MUD: Generating, Validating and Applying IoT Behaviorial Profiles (Technical Report)

Ayyoob Hamza, Dinesha Ranathunga, H. Habibi Gharakheili et al.

IoT devices are increasingly being implicated in cyber-attacks, driving community concern about the risks they pose to critical infrastructure, corporations, and citizens. In order to reduce this risk, the IETF is pushing IoT vendors to develop formal specifications of the intended purpose of their IoT devices, in the form of a Manufacturer Usage Description (MUD), so that their network behavior in any operating environment can be locked down and verified rigorously. This paper aims to assist IoT manufacturers in developing and verifying MUD profiles, while also helping adopters of these devices to ensure they are compatible with their organizational policies. Our first contribution is to develop a tool that takes the traffic trace of an arbitrary IoT device as input and automatically generates a MUD profile for it. We contribute our tool as open source, apply it to 28 consumer IoT devices, and highlight insights and challenges encountered in the process. Our second contribution is to apply a formal semantic framework that not only validates a given MUD profile for consistency, but also checks its compatibility with a given organizational policy. Finally, we apply our framework to representative organizations and selected devices, to demonstrate how MUD can reduce the effort needed for IoT acceptance testing.

DBNov 7, 2021
Em-K Indexing for Approximate Query Matching in Large-scale ER

Samudra Herath, Matthew Roughan, Gary Glonek

Accurate and efficient entity resolution (ER) is a significant challenge in many data mining and analysis projects requiring integrating and processing massive data collections. It is becoming increasingly important in real-world applications to develop ER solutions that produce prompt responses for entity queries on large-scale databases. Some of these applications demand entity query matching against large-scale reference databases within a short time. We define this as the query matching problem in ER in this work. Indexing or blocking techniques reduce the search space and execution time in the ER process. However, approximate indexing techniques that scale to very large-scale datasets remain open to research. In this paper, we investigate the query matching problem in ER to propose an indexing method suitable for approximate and efficient query matching. We first use spatial mappings to embed records in a multidimensional Euclidean space that preserves the domain-specific similarity. Among the various mapping techniques, we choose multidimensional scaling. Then using a Kd-tree and the nearest neighbour search, the method returns a block of records that includes potential matches for a query. Our method can process queries against a large-scale dataset using only a fraction of the data $L$ (given the dataset size is $N$), with a $O(L^2)$ complexity where $L \ll N$. The experiments conducted on several datasets showed the effectiveness of the proposed method.

LGNov 7, 2021
High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling

Samudra Herath, Matthew Roughan, Gary Glonek

The recent rapid growth of the dimension of many datasets means that many approaches to dimension reduction (DR) have gained significant attention. High-performance DR algorithms are required to make data analysis feasible for big and fast data sets. However, many traditional DR techniques are challenged by truly large data sets. In particular multidimensional scaling (MDS) does not scale well. MDS is a popular group of DR techniques because it can perform DR on data where the only input is a dissimilarity function. However, common approaches are at least quadratic in memory and computation and, hence, prohibitive for large-scale data. We propose an out-of-sample embedding (OSE) solution to extend the MDS algorithm for large-scale data utilising the embedding of only a subset of the given data. We present two OSE techniques: the first based on an optimisation approach and the second based on a neural network model. With a minor trade-off in the approximation, the out-of-sample techniques can process large-scale data with reasonable computation and memory requirements. While both methods perform well, the neural network model outperforms the optimisation approach of the OSE solution in terms of efficiency. OSE has the dual benefit that it allows fast DR on streaming datasets as well as static databases.

MESep 7, 2020
Simulating Name-like Vectors for Testing Large-scale Entity Resolution

Samudra Herath, Matthew Roughan, Gary Glonek

Accurate and efficient entity resolution (ER) has been a problem in data analysis and data mining projects for decades. In our work, we are interested in developing ER methods to handle big data. Good public datasets are restricted in this area and usually small in size. Simulation is one technique for generating datasets for testing. Existing simulation tools have problems of complexity, scalability and limitations of resampling. We address these problems by introducing a better way of simulating testing data for big data ER. Our proposed simulation model is simple, inexpensive and fast. We focus on avoiding the detail-level simulation of records using a simple vector representation. In this paper, we will discuss how to simulate simple vectors that approximate the properties of names (commonly used as identification keys).

CRFeb 15, 2019
ForestFirewalls: Getting Firewall Configuration Right in Critical Networks (Technical Report)

Dinesha Ranathunga, Matthew Roughan, Paul Tune et al.

Firewall configuration is critical, yet often conducted manually with inevitable errors, leaving networks vulnerable to cyber attack [40]. The impact of misconfigured firewalls can be catastrophic in Supervisory Control and Data Acquisition (SCADA) networks. These networks control the distributed assets of industrial systems such as power generation and water distribution systems. Automation can make designing firewall configurations less tedious and their deployment more reliable. In this paper, we propose ForestFirewalls, a high-level approach to configuring SCADA firewalls. Our goals are three-fold. We aim to: first, decouple implementation details from security policy design by abstracting the former; second, simplify policy design; and third, provide automated checks, pre and post-deployment, to guarantee configuration accuracy. We achieve these goals by automating the implementation of a policy to a network and by auto-validating each stage of the configuration process. We test our approach on a real SCADA network to demonstrate its effectiveness.

CRMay 30, 2016
The Mathematical Foundations for Mapping Policies to Network Devices (Technical Report)

Dinesha Ranathunga, Matthew Roughan, Phil Kernick et al.

A common requirement in policy specification languages is the ability to map policies to the underlying network devices. Doing so, in a provably correct way, is important in a security policy context, so administrators can be confident of the level of protection provided by the policies for their networks. Existing policy languages allow policy composition but lack formal semantics to allocate policy to network devices. Our research tackles this from first principles: we ask how network policies can be described at a high-level, independent of firewall-vendor and network minutiae. We identify the algebraic requirements of the policy mapping process and propose semantic foundations to formally verify if a policy is implemented by the correct set of policy-arbiters. We show the value of our proposed algebras in maintaining concise network-device configurations by applying them to real-world networks.

DBMar 10, 2015
Unravelling Graph-Exchange File Formats

Matthew Roughan, Jonathan Tuke

A graph is used to represent data in which the relationships between the objects in the data are at least as important as the objects themselves. Over the last two decades nearly a hundred file formats have been proposed or used to provide portable access to such data. This paper seeks to review these formats, and provide some insight to both reduce the ongoing creation of unnecessary formats, and guide the development of new formats where needed.