CRMar 13, 2020
A report on personally identifiable sensor data from smartphone devicesMarios Fanourakis
An average smartphone is equipped with an abundance of sensors to provide a variety of vital functionalities and conveniences. The data from these sensors can be collected in order to find trends or discover interesting correlations in the data but can also be used by nefarious entities for the purpose of revealing the identity of the persons who generated this data.In this paper, we seek to identify what types of sensor data can be collected on a smartphone and which of those types can pose a threat to user privacy by looking into the hardware capabilities of modern smartphone devices and how smartphone data is used in the literature. We then summarize some implications that this information could have on the GDPR.
CRMar 11, 2020
Opportunistic multi-party shuffling for data reporting privacyMarios Fanourakis
An important feature of data collection frameworks, in which voluntary participants are involved, is that of privacy. Besides data encryption, which protects the data from third parties in case the communication channel is compromised, there are schemes to obfuscate the data and thus provide some anonymity in the data itself, as well as schemes that 'mix' the data to prevent tracing the data back to the source by using network identifiers. This mixing is usually implemented by utilizing special mix networks in the data collection framework. In this paper we focus on schemes for mixing the data where the participants do not need to trust the mix network or the data collector with hiding the source of the data so that we can evaluate the efficacy of peer to peer mixing strategies in the real world. To achieve this, we present a simple opportunistic multi-party shuffling scheme to mix the data and effectively obfuscate the source of the data. We successfully simulate 3 cases with artificial parameters and then use the real-world Mobile Data Challenge (MDC) data to simulate an additional 2 scenarios with realistic parameters. Our results show that such approaches can be effective depending on the time constraints of the data collection and we conclude with design implications for the implementation of the proposed data collection scheme in real life deployments.