Open ERP System Data For Occupational Fraud Detection
This addresses the problem of limited data access for researchers and practitioners in occupational fraud detection, though it is incremental as it focuses on data generation rather than new detection techniques.
The paper tackled the lack of public ERP system data for fraud detection by generating simulated datasets that include normal operations and fraud scenarios, resulting in publicly available raw and aggregated data for open development and comparison of methods.
Recent estimates report that companies lose 5% of their revenue to occupational fraud. Since most medium-sized and large companies employ Enterprise Resource Planning (ERP) systems to track vast amounts of information regarding their business process, researchers have in the past shown interest in automatically detecting fraud through ERP system data. Current research in this area, however, is hindered by the fact that ERP system data is not publicly available for the development and comparison of fraud detection methods. We therefore endeavour to generate public ERP system data that includes both normal business operation and fraud. We propose a strategy for generating ERP system data through a serious game, model a variety of fraud scenarios in cooperation with auditing experts, and generate data from a simulated make-to-stock production company with multiple research participants. We aggregate the generated data into ready to used datasets for fraud detection in ERP systems, and supply both the raw and aggregated data to the general public to allow for open development and comparison of fraud detection approaches on ERP system data.