Universal and Distinct Properties of Communication Dynamics: How to Generate Realistic Inter-event Times
This work addresses the need for realistic models of communication dynamics for researchers and practitioners in fields like network analysis and anomaly detection, though it is incremental in building on existing point process methods.
The paper tackles the problem of understanding and generating realistic inter-event times in communication data, analyzing eight real datasets to identify four universal patterns and proposing the Self-Feeding Process (SFP) to generate such times with at most two parameters, enabling synthetic dataset creation and anomaly detection.
With the advancement of information systems, means of communications are becoming cheaper, faster and more available. Today, millions of people carrying smart-phones or tablets are able to communicate at practically any time and anywhere they want. Among others, they can access their e-mails, comment on weblogs, watch and post comments on videos, make phone calls or text messages almost ubiquitously. Given this scenario, in this paper we tackle a fundamental aspect of this new era of communication: how the time intervals between communication events behave for different technologies and means of communications? Are there universal patterns for the inter-event time distribution (IED)? In which ways inter-event times behave differently among particular technologies? To answer these questions, we analyze eight different datasets from real and modern communication data and we found four well defined patterns that are seen in all the eight datasets. Moreover, we propose the use of the Self-Feeding Process (SFP) to generate inter-event times between communications. The SFP is extremely parsimonious point process that requires at most two parameters and is able to generate inter-event times with all the universal properties we observed in the data. We show the potential application of SFP by proposing a framework to generate a synthetic dataset containing realistic communication events of any one of the analyzed means of communications (e.g. phone calls, e-mails, comments on blogs) and an algorithm to detect anomalies.