61.8LGMay 27
Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model EditingLi Lei, Madalina Ciobanu, Qingqing Mao et al.
LLMs increasingly require surgical model editing to enhance domain-specific capabilities without incurring the computational cost or catastrophic forgetting associated with full fine-tuning. Sparse Autoencoders (SAEs) have emerged as a promising tool in this setting, in principle allowing for feature-level identification of where to intervene. In this work, we rigorously evaluate an SAE-guided editing pipeline for mathematical reasoning on Gemma-3-4B-IT and uncover a fundamental failure mode: the intuitively appealing approach of projecting task vectors onto SAE feature subspaces acts as an information bottleneck that discards approximately 97% of the modification energy, yielding no statistically significant improvements across seven math subjects. We show that this failure stems from a geometric misalignment between activation-space SAE directions and weight-space task vectors. We then propose a shift in perspective: SAE as a Stethoscope, Not a Scalpel, where SAEs are used for layer-level diagnosis rather than intervention-level filtering. By injecting unfiltered raw task vectors only into layers identified by an SAE-derived specificity score, we improve Number Theory accuracy from 29.6% to 39.4% (z=+3.41, p=0.0007) on the Minerva Math benchmark; 5 of 7 math subjects significantly improved and none significantly degraded. Our method is fully deterministic, requires no additional inference cost, and provides a principled framework for interpretability-guided model editing.
CRJan 17, 2018Code
Integrating Remote Attestation with Transport Layer SecurityThomas Knauth, Michael Steiner, Somnath Chakrabarti et al.
Intel(R) Software Guard Extensions (Intel(R) SGX) is a promising technology to securely process information in otherwise untrusted environments. An important aspect of Intel SGX is the ability to perform remote attestation to assess the endpoint's trustworthiness. Ultimately, remote attestation will result in an attested secure channel to provision secrets to the enclave. We seamlessly combine Intel SGX remote attestation with the establishment of a standard Transport Layer Security (TLS) connection. Remote attestation is performed during the connection setup. To achieve this, we neither change the TLS protocol, nor do we modify existing protocol implementations. We have prototype implementations for three widely used open-source TLS libraries: OpenSSL, wolfSSL and mbedTLS. We describe the requirements, design and implementation details to seamlessly bind attested TLS endpoints to Intel SGX enclaves.
SEMay 10, 2019
Hardware/Software Co-monitoringLi Lei, Kai Cong, Zhenkun Yang et al.
Hardware/Software (HW/SW) interfaces, mostly implemented as devices and device drivers, are pervasive in various computer systems. Nowadays HW/SW interfaces typically undergo intensive testing and validation before release, but they are still unreliable and insecure when deployed together with computer systems to end users. Escaped logic bugs, hardware transient failures, and malicious exploits are prevalent in HW/SW interactions, making the entire system vulnerable and unstable. We present HW/SW co-monitoring, a runtime co-verification approach to detecting failures and malicious exploits in device/driver interactions. Our approach utilizes a formal device model (FDM), a transaction-level model derived from the device specification, to shadow the real device execution. Based on the co-execution of the device and FDM, HW/SW co-monitoring carries out two-tier runtime checking: (1) device checking checks if the device behaviors conform to the FDM behaviors; (2) property checking detects invalid driver commands issued to the device by verifying system properties against driver/device interactions. We have applied HW/SW co-monitoring to five widely-used devices and their Linux drivers, discovering 9 real bugs and vulnerabilities while introducing modest runtime overhead. The results demonstrate the major potential of HW/SW co-monitoring in improving system reliability and security.