Stephan Kleber

4papers

52citations

Novelty51%

AI Score48

Ranked #50,426 of 205,806 authors (top 25%)#1,012 in CR (top 14%)

4 Papers

29.6CRApr 25Code

ARIstoteles -- Dissecting Apple's Baseband Interface

Tobias Kröll, Stephan Kleber, Frank Kargl et al.

Wireless chips and interfaces expose a substantial remote attack surface. As of today, most cellular baseband security research is performed on the Android ecosystem, leaving a huge gap on Apple devices. With iOS jailbreaks, last-generation wireless chips become fairly accessible for performance and security research. Yet, iPhones were never intended to be used as a research platform, and chips and interfaces are undocumented. One protocol to interface with such chips is Apple Remote Invocation (ARI), which interacts with the central phone component CommCenter and multiple user-space daemons, thereby posing a Remote Code Execution (RCE) attack surface. We are the first to reverse-engineer and fuzz-test the ARI interface on iOS. Our Ghidra scripts automatically generate a Wireshark dissector, called ARIstoteles, by parsing closed-source iOS libraries for this undocumented protocol. Moreover, we compare the quality of the dissector to fully-automated approaches based on static trace analysis. Finally, we fuzz the ARI interface based on our reverse-engineering results. The fuzzing results indicate that ARI does not only lack public security research but also has not been well-tested by Apple. By releasing ARIstoteles open-source, we also aim to facilitate similar research in the future.

52.5CRMar 23Code

Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models

Tom Biskupski, Stephan Kleber

A Large Language Model (LLM) as judge evaluates the quality of victim Machine Learning (ML) models, specifically LLMs, by analyzing their outputs. An LLM as judge is the combination of one model and one specifically engineered judge prompt that contains the criteria for the analysis. The resulting automation of the analysis scales up the complex evaluation of the victim models' free-form text outputs by faster and more consistent judgments compared to human reviewers. Thus, quality and security assessments of LLMs can cover a wide range of the victim models' use cases. Being a comparably new technique, LLMs as judges lack a thorough investigation for their reliability and agreement to human judgment. Our work evaluates the applicability of LLMs as automated quality assessors of victim LLMs. We test the efficacy of 37 differently sized conversational LLMs in combination with 5 different judge prompts, the concept of a second-level judge, and 5 models fine-tuned for the task as assessors. As assessment objective, we curate datasets for eight different categories of judgment tasks and the corresponding ground-truth labels based on human assessments. Our empirical results show a high correlation of LLMs as judges with human assessments, when combined with a suitable prompt, in particular for GPT-4o, several open-source models with $\geqslant$ 32B parameters, and a few smaller models like Qwen2.5 14B.

NIFeb 9, 2020

Message Type Identification of Binary Network Protocols using Continuous Segment Similarity

Stephan Kleber, Rens Wouter van der Heijden, Frank Kargl

Protocol reverse engineering based on traffic traces infers the behavior of unknown network protocols by analyzing observable network messages. To perform correct deduction of message semantics or behavior analysis, accurate message type identification is an essential first step. However, identifying message types is particularly difficult for binary protocols, whose structural features are hidden in their densely packed data representation. We leverage the intrinsic structural features of binary protocols and propose an accurate method for discriminating message types. Our approach uses a similarity measure with continuous value range by comparing feature vectors where vector elements correspond to the fields in a message, rather than discrete byte values. This enables a better recognition of structural patterns, which remain hidden when only exact value matches are considered. We combine Hirschberg alignment with DBSCAN as cluster algorithm to yield a novel inference mechanism. By applying novel autoconfiguration schemes, we do not require manually configured parameters for the analysis of an unknown protocol, as required by earlier approaches. Results of our evaluations show that our approach has considerable advantages in message type identification result quality and also execution performance over previous approaches.

CRAug 3, 2018

An SDN-based Approach For Defending Against Reflective DDoS Attacks

Thomas Lukaseder, Kevin Stölzle, Stephan Kleber et al.

Distributed Reflective Denial of Service (DRDoS) attacks are an immanent threat to Internet services. The potential scale of such attacks became apparent in March 2018 when a memcached-based attack peaked at 1.7 Tbps. Novel services built upon UDP increase the need for automated mitigation mechanisms that react to attacks without prior knowledge of the actual application protocols used. With the flexibility that software-defined networks offer, we developed a new approach for defending against DRDoS attacks; it not only protects against arbitrary DRDoS attacks but is also transparent for the attack target and can be used without assistance of the target host operator. The approach provides a robust mitigation system which is protocol-agnostic and effective in the defense against DRDoS attacks.