MuSimA: A Tool with Multi-modal Input for Generating Bespoke ABAC Datasets
Addresses the lack of synthetic dataset generation tools for ABAC researchers, though the approach is incremental.
MuSimA is a web-based tool for generating synthetic ABAC datasets with user-specified attribute distributions, supporting JSON input or hand-drawn sketches interpreted by an LLM. It enables scalable data generation for testing ABAC systems.
Recent advances in research on Attribute-based Access Control (ABAC) has led to the development of several ingenious methods for representing and enforcing organizational security policies. However, so far little effort has been spent towards building a tool for generating large-scale synthetic datasets that can be used to test the developed ABAC systems. In this paper, we address this shortcoming by building MuSimA - a web-based tool for generating ABAC datasets with user-specified probability distributions of attribute values. It supports multi-modal input, i.e., users can provide specifications either as a structured JSON file or as a combination of a minimal JSON along with hand-drawn distribution sketches. In the latter case, a Large Language Model is used to automatically extract appropriate distribution parameters from the sketches. The generated synthetic ABAC data matching the input specifications can be downloaded by the user. For studying scalability of algorithms and methods related to ABAC, data can be generated for varying sizes and complexities. We make MuSimA freely available for use by the research community.