Thorsten Strufe

CR
h-index4
24papers
145citations
Novelty48%
AI Score43

24 Papers

CRMar 8, 2022
Understanding Person Identification through Gait

Simon Hanisch, Evelyn Muschter, Admantini Hatzipanayioti et al.

Gait recognition is the process of identifying humans from their bipedal locomotion such as walking or running. As such, gait data is privacy sensitive information and should be anonymized where possible. With the rise of higher quality gait recording techniques, such as depth cameras or motion capture suits, an increasing amount of detailed gait data is captured and processed. The introduction and rise of the Metaverse is an example of a potentially popular application scenario in which the gait of users is transferred onto digital avatars. As a first step towards developing effective anonymization techniques for high-quality gait data, we study different aspects of movement data to quantify their contribution to gait recognition. We first extract categories of features from the literature on human gait perception and then design experiments for each category to assess how much the information they contain contributes to recognition success. We evaluated the utility of gait perturbation by means of naturalness ratings in a user study. Our results show that gait anonymization will be challenging, as the data is highly redundant and inter-dependent.

CROct 19, 2022
Fantômas: Understanding Face Anonymization Reversibility

Julian Todt, Simon Hanisch, Thorsten Strufe

Face images are a rich source of information that can be used to identify individuals and infer private information about them. To mitigate this privacy risk, anonymizations employ transformations on clear images to obfuscate sensitive information, all while retaining some utility. Albeit published with impressive claims, they sometimes are not evaluated with convincing methodology. Reversing anonymized images to resemble their real input -- and even be identified by face recognition approaches -- represents the strongest indicator for flawed anonymization. Some recent results indeed indicate that this is possible for some approaches. It is, however, not well understood, which approaches are reversible, and why. In this paper, we provide an exhaustive investigation in the phenomenon of face anonymization reversibility. Among other things, we find that 11 out of 15 tested face anonymizations are at least partially reversible and highlight how both reconstruction and inversion are the underlying processes that make reversal possible.

CRJul 9, 2024
SEBA: Strong Evaluation of Biometric Anonymizations

Julian Todt, Simon Hanisch, Thorsten Strufe

Biometric data is pervasively captured and analyzed. Using modern machine learning approaches, identity and attribute inferences attacks have proven high accuracy. Anonymizations aim to mitigate such disclosures by modifying data in a way that prevents identification. However, the effectiveness of some anonymizations is unclear. Therefore, improvements of the corresponding evaluation methodology have been proposed recently. In this paper, we introduce SEBA, a framework for strong evaluation of biometric anonymizations. It combines and implements the state-of-the-art methodology in an easy-to-use and easy-to-expand software framework. This allows anonymization designers to easily test their techniques using a strong evaluation methodology. As part of this discourse, we introduce and discuss new metrics that allow for a more straightforward evaluation of the privacy-utility trade-off that is inherent to anonymization attempts. Finally, we report on a prototypical experiment to demonstrate SEBA's applicability.

CRMar 12
Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

Patricia Guerra-Balboa, Annika Sauer, Héber H. Arcolezi et al.

Differential Privacy (DP) is widely adopted in data management systems to enable data sharing with formal disclosure guarantees. A central systems challenge is understanding how DP noise translates into effective protection against inference attacks, since this directly determines achievable utility. Most existing analyses focus only on membership inference -- capturing only a threat -- or rely on reconstruction robustness (ReRo). However, under realistic assumptions, we show that ReRo can yield misleading risk estimates and violate claimed bounds, limiting their usefulness for principled DP calibration and auditing. This paper introduces reconstruction advantage, a unified risk metric that consistently captures risk across membership inference, attribute inference, and data reconstruction. We derive tight bounds that relate DP noise to adversarial advantage and characterize optimal adversarial strategies for arbitrary DP mechanisms and attacker knowledge. These results enable risk-driven noise calibration and provide a foundation for systematic DP auditing. We show that reconstruction advantage improves the accuracy and scope of DP auditing and enables more effective utility-privacy trade-offs in DP-enabled data management systems.

LGApr 10, 2025
Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data

Jose Cribeiro-Ramallo, Federico Matteucci, Paul Enciu et al.

Outlier detection in high-dimensional tabular data is challenging since data is often distributed across multiple lower-dimensional subspaces -- a phenomenon known as the Multiple Views effect (MV). This effect led to a large body of research focused on mining such subspaces, known as subspace selection. However, as the precise nature of the MV effect was not well understood, traditional methods had to rely on heuristic-driven search schemes that struggle to accurately capture the true structure of the data. Properly identifying these subspaces is critical for unsupervised tasks such as outlier detection or clustering, where misrepresenting the underlying data structure can hinder the performance. We introduce Myopic Subspace Theory (MST), a new theoretical framework that mathematically formulates the Multiple Views effect and writes subspace selection as a stochastic optimization problem. Based on MST, we introduce V-GAN, a generative method trained to solve such an optimization problem. This approach avoids any exhaustive search over the feature space while ensuring that the intrinsic data structure is preserved. Experiments on 42 real-world datasets show that using V-GAN subspaces to build ensemble methods leads to a significant increase in one-class classification performance -- compared to existing subspace selection, feature selection, and embedding methods. Further experiments on synthetic data show that V-GAN identifies subspaces more accurately while scaling better than other relevant subspace selection methods. These results confirm the theoretical guarantees of our approach and also highlight its practical viability in high-dimensional settings.

LGNov 4, 2024
R+R:Understanding Hyperparameter Effects in DP-SGD

Felix Morsbach, Jan Reubold, Thorsten Strufe

Research on the effects of essential hyperparameters of DP-SGD lacks consensus, verification, and replication. Contradictory and anecdotal statements on their influence make matters worse. While DP-SGD is the standard optimization algorithm for privacy-preserving machine learning, its adoption is still commonly challenged by low performance compared to non-private learning approaches. As proper hyperparameter settings can improve the privacy-utility trade-off, understanding the influence of the hyperparameters promises to simplify their optimization towards better performance, and likely foster acceptance of private learning. To shed more light on these influences, we conduct a replication study: We synthesize extant research on hyperparameter influences of DP-SGD into conjectures, conduct a dedicated factorial study to independently identify hyperparameter effects, and assess which conjectures can be replicated across multiple datasets, model architectures, and differential privacy budgets. While we cannot (consistently) replicate conjectures about the main and interaction effects of the batch size and the number of epochs, we were able to replicate the conjectured relationship between the clipping threshold and learning rate. Furthermore, we were able to quantify the significant importance of their combination compared to the other hyperparameters.

CRMar 13
FoSAM: Forward Secret Messaging in Ad-Hoc Networks

Daniel Schadt, Christoph Coijanovic, Thorsten Strufe

Apps such as Firechat and Bridgefy have been used during recent protests in Hong Kong and Iran, as they allow communication over ad-hoc wireless networks even when internet access is restricted. However, these apps do not provide sufficient protection as they do not achieve forward secrecy in unreliable networks. Without forward secrecy, caught protesters' devices will disclose all previous messages to the authorities, putting them and others at great risk. In this paper, we introduce FoSAM, the first protocol to provide proven anonymous and forward secret messaging in unreliable ad-hoc networks. Communication in FoSAM requires only the receiver's public key, rather than an interactive handshake. We evaluate the performance of FoSAM using a large-scale simulation with different user movement patterns, showing that it achieves between 92% and 99% successful message delivery. We additionally implement a FoSAM prototype for Android.

CRSep 9, 2021
Privacy-Protecting Techniques for Behavioral Biometric Data: A Survey

Simon Hanisch, Patricia Arias-Cabarcos, Javier Parra-Arnau et al.

Our behavior (the way we talk, walk, act or think) is unique and can be used as a biometric trait. It also correlates with sensitive attributes like emotions and health conditions. Hence, techniques to protect individuals privacy against unwanted inferences are required, if such data is planned to be processed. To consolidate knowledge in this area, we systematically review applicable anonymization techniques. We taxonomize and compare existing solutions regarding privacy goals, conceptual operation, advantages, and limitations. We review anonymization techniques for the behavioral biometric traits of voice, gait, hand motions, eye-gaze, heartbeat (ECG), and brain activity (EEG). Our analysis shows that some behavioral traits (e.g., voice) have received much attention, while others (e.g., eye-gaze, brain activity) are mostly neglected. We also find that the evaluation methodology of behavioral anonymization techniques can be further improved.

CRAug 19, 2021
2PPS -- Publish/Subscribe with Provable Privacy

Sarah Abdelwahab Gaballah, Christoph Coijanovic, Thorsten Strufe et al.

Publish/Subscribe systems like Twitter and Reddit let users communicate with many recipients without requiring prior personal connections. The content that participants of these systems publish and subscribe to is typically public, but they may nevertheless wish to remain anonymous. While many existing systems allow users to omit explicit identifiers, they do not address the obvious privacy risks of being associated with content that may contain a wide range of sensitive information. We present 2PPS (Twice-Private Publish-Subscribe), the first pub/sub protocol to deliver strong provable privacy protection for both publishers and subscribers, leveraging Distributed Point Function-based secret sharing for publishing and Private Information Retrieval for subscribing. 2PPS does not require trust in other clients and its privacy guarantees hold as long as even a single honest server participant remains. Furthermore, it is scalable and delivers latency suitable for microblogging applications. A prototype implementation of 2PPS can handle 100,000 concurrent active clients with 5 seconds end-to-end latency and significantly lower bandwidth requirements than comparable systems.

CRAug 9, 2021
Topology Inference of Networks utilizing Rooted Spanning Tree Embeddings

Martin Byrenheid, Stefanie Roos, Thorsten Strufe

Due to its high efficiency, routing based on greedy embeddings of rooted spanning trees is a promising approach for dynamic, large-scale networks with restricted topologies. Friend-to-friend (F2F) overlays, one key application of embedding-based routing, aim to prevent disclosure of their participants to malicious members by restricting exchange of messages to mutually trusted nodes. Since embeddings assign a unique integer vector to each node that encodes its position in a spanning tree of the overlay, attackers can infer network structure from knowledge about assigned vectors. As this information can be used to identify participants, an evaluation of the scale of leakage is needed. In this work, we analyze in detail which information malicious participants can infer from knowledge about assigned vectors. Also, we show that by monitoring packet trajectories, malicious participants cannot unambiguously infer links between nodes of unidentified participants. Using simulation, we find that the vector assignment procedure has a strong impact on the feasibility of inference. In F2F overlay networks, using vectors of randomly chosen numbers for routing decreases the mean number of discovered individuals by one order of magnitude compared to the popular approach of using child enumeration indexes as vector elements.

CRMar 4, 2021
On the privacy-utility trade-off in differentially private hierarchical text classification

Dominik Wunderlich, Daniel Bernau, Francesco Aldà et al.

Hierarchical text classification consists in classifying text documents into a hierarchy of classes and sub-classes. Although artificial neural networks have proved useful to perform this task, unfortunately they can leak training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models, enabling the models to be shared safely at the cost of reduced model accuracy. This work investigates the privacy-utility trade-off in hierarchical text classification with differential privacy guarantees, and identifies neural network architectures that offer superior trade-offs. To this end, we use a white-box membership inference attack to empirically assess the information leakage of three widely used neural network architectures. We show that large differential privacy parameters already suffice to completely mitigate membership inference attacks, thus resulting only in a moderate decrease in model utility. More specifically, for large datasets with long texts we observed Transformer-based models to achieve an overall favorable privacy-utility trade-off, while for smaller datasets with shorter texts convolutional neural networks are preferable.

CRNov 17, 2020
SoK on Performance Bounds in Anonymous Communication

Christiane Kuhn, Friederike Kitzing, Thorsten Strufe

Communicating anonymously comes at a cost - and large communities have been in a constant tug-of-war between the development of faster protocols, and the improvement of security analyses. Thereby more intricate privacy goals emerged and more detailed bounds on the minimum overhead necessary to achieve them were proven. The entanglement of requirements, scenarios, and protocols complicates analysis, and the published results are hardly comparable, due to deviating, yet specific choices of assumptions and goals (some explicit, most implicit). In this paper, we systematize the field by harmonizing the models, comparing the proven performance bounds, and contextualizing these theoretical results in a broad set of proposed and implemented systems. By identifying inaccuracies, we demonstrate that the attacks, on which the results are based, indeed break much weaker privacy goals than postulated, and tighten the bounds along the way. We further show the equivalence of two seemingly alternative bounds. Finally, we argue how several assumptions and requirements of the papers likely are of limited applicability in reality and suggest relaxations for future work.

CRApr 16, 2020
Covid Notions: Towards Formal Definitions -- and Documented Understanding -- of Privacy Goals and Claimed Protection in Proximity-Tracing Services

Christiane Kuhn, Martin Beck, Thorsten Strufe

The recent SARS-CoV-2 pandemic gave rise to management approaches using mobile apps for contact tracing. The corresponding apps track individuals and their interactions, to facilitate alerting users of potential infections well before they become infectious themselves. Naive implementation obviously jeopardizes the privacy of health conditions, location, activities, and social interaction of its users. A number of protocol designs for colocation tracking have already been developed, most of which claim to function in a privacy preserving manner. However, despite claims such as "GDPR compliance", "anonymity", "pseudonymity" or other forms of "privacy", the authors of these designs usually neglect to precisely define what they (aim to) protect. We make a first step towards formally defining the privacy notions of proximity tracing services, especially with regards to the health, (co-)location, and social interaction of their users. We also give a high-level intuition of which protection the most prominent proposals can and cannot achieve. This initial overview indicates that all proposals include some centralized services, and none protects identity and (co-)locations of infected users perfectly from both other users and the service provider.

CRFeb 12, 2020
Efficient Cloud-based Secret Shuffling via Homomorphic Encryption

Kilian Becher, Thorsten Strufe

When working with joint collections of confidential data from multiple sources, e.g., in cloud-based multi-party computation scenarios, the ownership relation between data providers and their inputs itself is confidential information. Protecting data providers' privacy desires a function for secretly shuffling the data collection. We present the first efficient secure multi-party computation protocol for secret shuffling in scenarios with a central server. Based on a novel approach to random index distribution, our solution enables the randomization of the order of a sequence of encrypted data such that no observer can map between elements of the original sequence and the shuffled sequence with probability better than guessing. It allows for shuffling data encrypted under an additively homomorphic cryptosystem with constant round complexity and linear computational complexity. Being a general-purpose protocol, it is of relevance for a variety of practical use cases.

CROct 30, 2019
Breaking and (Partially) Fixing Provably Secure Onion Routing

Christiane Kuhn, Martin Beck, Thorsten Strufe

After several years of research on onion routing, Camenisch and Lysyanskaya, in an attempt at rigorous analysis, defined an ideal functionality in the universal composability model, together with properties that protocols have to meet to achieve provable security. A whole family of systems based their security proofs on this work. However, analyzing HORNET and Sphinx, two instances from this family, we show that this proof strategy is broken. We discover a previously unknown vulnerability that breaks anonymity completely, and explain a known one. Both should not exist if privacy is proven correctly. In this work, we analyze and fix the proof strategy used for this family of systems. After proving the efficacy of the ideal functionality, we show how the original properties are flawed and suggest improved, effective properties in their place. Finally, we discover another common mistake in the proofs. We demonstrate how to avoid it by showing our improved properties for one protocol, thus partially fixing the family of provably secure onion routing protocols.

PLOct 2, 2019
RecordFlux: Formal Message Specification and Generation of Verifiable Binary Parsers

Tobias Reiher, Alexander Senier, Jeronimo Castrillon et al.

Various vulnerabilities have been found in message parsers of protocol implementations in the past. Even highly sensitive software components like TLS libraries are affected regularly. Resulting issues range from denial-of-service attacks to the extraction of sensitive information. The complexity of protocols and imprecise specifications in natural language are the core reasons for subtle bugs in implementations, which are hard to find. The lack of precise specifications impedes formal verification. In this paper, we propose a model and a corresponding domain-specific language to formally specify message formats of existing real-world binary protocols. A unique feature of the model is the capability to define invariants, which specify relations and dependencies between message fields. Furthermore, the model allows defining the relation of messages between different protocol layers and thus ensures correct interpretation of payload data. We present a technique to derive verifiable parsers based on the model, generate efficient code for their implementation, and automatically prove the absence of runtime errors. Examples of parser specifications for Ethernet and TLS demonstrate the applicability of our approach.

CRJan 9, 2019
Attack-resistant Spanning Tree Construction in Route-Restricted Overlay Networks

Martin Byrenheid, Stefanie Roos, Thorsten Strufe

Nodes in route-restricted overlays have an immutable set of neighbors, explicitly specified by their users. Popular examples include payment networks such as the Lightning network as well as social overlays such as the Dark Freenet. Routing algorithms are central to such overlays as they enable communication between nodes that are not directly connected. Recent results show that algorithms based on spanning trees are the most promising provably efficient choice. However, all suggested solutions fail to address how distributed spanning tree algorithms can deal with active denial of service attacks by malicious nodes. In this work, we design a novel self-stabilizing spanning tree construction algorithm that utilizes cryptographic signatures and prove that it reduces the set of nodes affected by active attacks. Our simulations substantiate this theoretical result with concrete values based on real-world data sets. In particular, our results indicate that our algorithm reduces the number of affected nodes by up to 74% compared to state-of-the-art attack-resistant spanning tree constructions.

CRDec 13, 2018
On Privacy Notions in Anonymous Communication

Christiane Kuhn, Martin Beck, Stefan Schiffner et al.

Many anonymous communication networks (ACNs) with different privacy goals have been developed. However, there are no accepted formal definitions of privacy and ACNs often define their goals and adversary models ad hoc. However, for the understanding and comparison of different flavors of privacy, a common foundation is needed. In this paper, we introduce an analysis framework for ACNs that captures the notions and assumptions known from different analysis frameworks. Therefore, we formalize privacy goals as notions and identify their building blocks. For any pair of notions we prove whether one is strictly stronger, and, if so, which. Hence, we are able to present a complete hierarchy. Further, we show how to add practical assumptions, e.g. regarding the protocol model or user corruption as options to our notions. This way, we capture the notions and assumptions of, to the best of our knowledge, all existing analytical frameworks for ACNs and are able to revise inconsistencies between them. Thus, our new framework builds a common ground and allows for sharper analysis, since new combinations of assumptions are possible and the relations between the notions are known.

CROct 11, 2018
An Enhanced Approach to Cloud-based Privacy-preserving Benchmarking (Long Version)

Kilian Becher, Martin Beck, Thorsten Strufe

Benchmarking is an important measure for companies to investigate their performance and to increase efficiency. As companies usually are reluctant to provide their key performance indicators (KPIs) for public benchmarks, privacy-preserving benchmarking systems are required. In this paper, we present an enhanced privacy-preserving benchmarking protocol that is based on homomorphic encryption. It enables cloud-based KPI comparison including the statistical measures mean, variance, median, maximum, best-in-class, bottom quartile, and top quartile. The theoretical and empirical evaluation of our benchmarking system underlines its practicability. Even under worst-case assumptions regarding connection quality and asymmetric encryption key-security, it fulfils the performance requirements of typical KPI benchmarking systems.

MLJun 19, 2017
Infinite Mixture Model of Markov Chains

Jan Reubold, Thorsten Strufe, Ulf Brefeld

We propose a Bayesian nonparametric mixture model for prediction- and information extraction tasks with an efficient inference scheme. It models categorical-valued time series that exhibit dynamics from multiple underlying patterns (e.g. user behavior traces). We simplify the idea of capturing these patterns by hierarchical hidden Markov models (HHMMs) - and extend the existing approaches by the additional representation of structural information. Our empirical results are based on both synthetic- and real world data. They indicate that the results are easily interpretable, and that the model excels at segmentation and prediction performance: it successfully identifies the generating patterns and can be used for effective prediction of future observations.

CRJun 15, 2017
PrettyCat: Adaptive guarantee-controlled software partitioning of security protocols

Alexander Senier, Martin Beck, Thorsten Strufe

One single error can result in a total compromise of all security in today's large, monolithic software. Partitioning of software can help simplify code-review and verification, whereas isolated execution of software-components limits the impact of incorrect implementations. However, existing application partitioning techniques are too expensive, too imprecise, or involve unsafe manual steps. An automatic, yet safe, approach to dissect security protocols into component-based systems is not available. We present a method and toolset to automatically segregate security related software into an indefinite number of partitions, based on the security guarantees required by the deployed cryptographic building blocks. As partitioning imposes communication overhead, we offer a range of sound performance optimizations. Furthermore, by applying our approach to the secure messaging protocol OTR, we demonstrate its applicability and achieve a significant reduction of the trusted computing base. Compared to a monolithic implementation, only 29% of the partitioned protocol requires confidentiality guarantees with a process overhead comparable to common sandboxing techniques.

DCJan 19, 2017
Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing

Do Le Quoc, Martin Beck, Pramod Bhatotia et al.

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation, and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy guarantees for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error-estimation) and query execution budget; (iii) Latency: near real-time stream processing based on a scalable "synchronization-free" distributed architecture. The key idea behind our approach is to marry two existing techniques together: namely, sampling (used in the context of approximate computing) and randomized response (used in the context of privacy-preserving analytics). The resulting marriage is complementary - it achieves stronger privacy guarantees and also improves performance, a necessary ingredient for achieving low-latency stream analytics.

CRJan 22, 2016
VOUTE-Virtual Overlays Using Tree Embeddings

Stefanie Roos, Martin Beck, Thorsten Strufe

Friend-to-friend (F2F) overlays, which restrict direct communication to mutually trusted parties, are a promising substrate for privacy-preserving communication due to their inherent membership-concealment and Sybil-resistance. Yet, existing F2F overlays suffer from a low performance, are vulnerable to denial-of-service attacks, or fail to provide anonymity. In particular, greedy embeddings allow highly efficient communication in arbitrary connectivity-restricted overlays but require communicating parties to reveal their identity. In this paper, we present a privacy-preserving routing scheme for greedy embeddings based on anonymous return addresses rather than identifying node coordinates. We prove that the presented algorithm are highly scalalbe, with regard to the complexity of both the routing and the stabilization protocols. Furthermore, we show that the return addresses provide plausible deniability for both sender and receiver. We further enhance the routing's resilience by using multiple embeddings and propose a method for efficient content addressing. Our simulation study on real-world data indicates that our approach is highly efficient and effectively mitigates failures as well as powerful denial-of-service attacks.

SISep 14, 2013
Protecting Public OSN Posts from Unintended Access

Frederik Armknecht, Manuel Hauptmann, Stefanie Roos et al.

The design of secure and usable access schemes to personal data represent a major challenge of online social networks (OSNs). State of the art requires prior interaction to grant access. Sharing with users who are not subscribed or previously have not been accepted as contacts in any case is only possible via public posts, which can easily be abused by automatic harvesting for user profiling, targeted spear-phishing, or spamming. Moreover, users are restricted to the access rules defined by the provider, which may be overly restrictive, cumbersome to define, or insufficiently fine-grained. We suggest a complementary approach that can be easily deployed in addition to existing access control schemes, does not require any interaction, and includes even public, unsubscribed users. It exploits the fact that different social circles of a user share different experiences and hence encrypts arbitrary posts. Hence arbitrary posts are encrypted, such that only users with sufficient knowledge about the owner can decrypt. Assembling only well-established cryptographic primitives, we prove that the security of our scheme is determined by the entropy of the required knowledge. We consequently analyze the efficiency of an informed dictionary attack and assess the entropy to be on par with common passwords. A fully functional implementation is used for performance evaluations, and available for download on the Web.