83.1LGMay 23Code
Building Better Activation OraclesJan Bauer, Celeste De Schamphelaere, Adam Karvonen et al.
Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate. To this end, we improve the Activation Oracle (AO) training regime in four ways: training on on-policy rollouts, improving the conversational dataset, feeding more layers and an improvement to the injection formula. The capability improvements are marginal, but quality of life improvements are quite substantial. In addition, we open source the first comprehensive evaluation suite for AO quality, which we call AObench. Overall, we hope that our work sets a foundation that helps improve AOs and other models in the paradigm of scalable, end-to-end interpretability.
CRMar 15, 2021
Take a Bite of the Reality Sandwich: Revisiting the Security of Progressive Message Authentication CodesEric Wagner, Jan Bauer, Martin Henze
Message authentication guarantees the integrity of messages exchanged over untrusted channels. However, to achieve this goal, message authentication considerably expands packet sizes, which is especially problematic in constrained wireless environments. To address this issue, progressive message authentication provides initially reduced integrity protection that is often sufficient to process messages upon reception. This reduced security is then successively improved with subsequent messages to uphold the strong guarantees of traditional integrity protection. However, contrary to previous claims, we show in this paper that existing progressive message authentication schemes are highly susceptible to packet loss induced by poor channel conditions or jamming attacks. Thus, we consider it imperative to rethink how authentication tags depend on the successful reception of surrounding packets. To this end, we propose R2-D2, which uses randomized dependencies with parameterized security guarantees to increase the resilience of progressive authentication against packet loss. To deploy our approach to resource-constrained devices, we introduce SP-MAC, which implements R2-D2 using efficient XOR operations. Our evaluation shows that SP-MAC is resilient to sophisticated network-level attacks and operates as resources-conscious and fast as existing, yet insecure, progressive message authentication schemes.
IVNov 18, 2019
CD2 : Combined Distances of Contrast Distributions for the Assessment of Perceptual Quality of Image ProcessingSascha Xu, Jan Bauer, Benjamin Axmann
The quality of visual input is very important for both human and machine perception. Consequently many processing techniques exist that deal with different distortions. Usually image processing is applied freely and lacks redundancy regarding safety. We propose a novel image comparison method called the Combined Distances of Contrast Distributions (CD2) to protect against errors that arise during processing. Based on the distribution of image contrasts a new reduced-reference image quality assessment (IQA) method is introduced. By combining various distance functions excellent performance on IQA benchmarks is achieved with only a small data and computation overhead.