AIJan 15, 2023
Collective Privacy Recovery: Data-sharing Coordination via Decentralized Artificial IntelligenceEvangelos Pournaras, Mark Christopher Ballandies, Stefano Bennati et al.
Collective privacy loss becomes a colossal problem, an emergency for personal freedoms and democracy. But, are we prepared to handle personal data as scarce resource and collectively share data under the doctrine: as little as possible, as much as necessary? We hypothesize a significant privacy recovery if a population of individuals, the data collective, coordinates to share minimum data for running online services with the required quality. Here we show how to automate and scale-up complex collective arrangements for privacy recovery using decentralized artificial intelligence. For this, we compare for first time attitudinal, intrinsic, rewarded and coordinated data sharing in a rigorous living-lab experiment of high realism involving >27,000 real data disclosures. Using causal inference and cluster analysis, we differentiate criteria predicting privacy and five key data-sharing behaviors. Strikingly, data-sharing coordination proves to be a win-win for all: remarkable privacy recovery for people with evident costs reduction for service providers.
CRNov 18, 2020
Modelling imperfect knowledge via location semantics for realistic privacy risks estimation in trajectory dataStefano Bennati, Aleksandra Kovacevic
Mobility patterns of vehicles and people provide powerful data sources for location-based services such as fleet optimization and traffic flow analysis. Location-based service providers must balance the value they extract from trajectory data with protecting the privacy of the individuals behind those trajectories. Reaching this goal requires measuring accurately the values of utility and privacy. Current measurement approaches assume adversaries with perfect knowledge, thus overestimate the privacy risk. To address this issue we introduce a model of an adversary with imperfect knowledge about the target. The model is based on equivalence areas, spatio-temporal regions with a semantic meaning, e.g. the target's home, whose size and accuracy determine the skill of the adversary. We then derive the standard privacy metrics of k-anonymity, l-diversity and t-closeness from the definition of equivalence areas. These metrics can be computed on any dataset, irrespective of whether and what kind of anonymization has been applied to it. This work is of high relevance to all service providers acting as processors of trajectory data who want to manage privacy risks and optimize the privacy vs. utility trade-off of their services.
DCMar 21, 2017
PriMaL: A Privacy-Preserving Machine Learning Method for Event Detection in Distributed Sensor NetworksStefano Bennati, Catholijn M. Jonker
This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning method for reducing the privacy cost of information transmitted through a network. Distributed sensor networks are often used for automated classification and detection of abnormal events in high-stakes situations, e.g. fire in buildings, earthquakes, or crowd disasters. Such networks might transmit privacy-sensitive information, e.g. GPS location of smartphones, which might be disclosed if the network is compromised. Privacy concerns might slow down the adoption of the technology, in particular in the scenario of social sensing where participation is voluntary, thus solutions are needed which improve privacy without compromising on the event detection accuracy. PriMaL is implemented as a machine-learning layer that works on top of an existing event detection algorithm. Experiments are run in a general simulation framework, for several network topologies and parameter values. The privacy footprint of state-of-the-art event detection algorithms is compared within the proposed framework. Results show that PriMaL is able to reduce the privacy cost of a distributed event detection algorithm below that of the corresponding centralized algorithm, within the bounds of some assumptions about the protocol. Moreover the performance of the distributed algorithm is not statistically worse than that of the centralized algorithm.
CRJul 7, 2015
Malware Task Identification: A Data Driven ApproachEric Nunes, Casey Buto, Paulo Shakarian et al.
Identifying the tasks a given piece of malware was designed to perform (e.g. logging keystrokes, recording video, establishing remote access, etc.) is a difficult and time-consuming operation that is largely human-driven in practice. In this paper, we present an automated method to identify malware tasks. Using two different malware collections, we explore various circumstances for each - including cases where the training data differs significantly from test; where the malware being evaluated employs packing to thwart analytical techniques; and conditions with sparse training data. We find that this approach consistently out-performs the current state-of-the art software for malware task identification as well as standard machine learning approaches - often achieving an unbiased F1 score of over 0.9. In the near future, we look to deploy our approach for use by analysts in an operational cyber-security environment.