Deepak Garg

CR
h-index34
25papers
323citations
Novelty56%
AI Score53

25 Papers

76.3CLApr 25Code
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

Bishwamittra Ghosh, Soumi Das, Till Speicher et al.

Large language models (LLMs) operate in two fundamental learning modes - fine-tuning (FT) and in-context learning (ICL) - raising key questions about which mode yields greater language proficiency and whether they differ in their inductive biases. Prior studies comparing FT and ICL have yielded mixed and inconclusive results due to inconsistent experimental setups. To enable a rigorous comparison, we propose a formal language learning task - offering precise language boundaries, controlled string sampling, and no data contamination - and introduce a discriminative test for language proficiency, where an LLM succeeds if it assigns higher generation probability to in-language strings than to out-of-language strings. Empirically, we find that: (a) FT has greater language proficiency than ICL on in-distribution generalization, but both perform equally well on out-of-distribution generalization. (b) Their inductive biases, measured by the correlation in string generation probabilities, are similar when both modes partially learn the language but diverge at higher proficiency levels. (c) Unlike FT, ICL performance differs substantially across models of varying sizes and families and is sensitive to the token vocabulary of the language. Thus, our work demonstrates the promise of formal languages as a controlled testbed for evaluating LLMs, behaviors that are difficult to isolate in natural language datasets. Our source code is available at https://github.com/bishwamittra/formallm.

32.9PLApr 20
Compositional security definitions for higher-order where declassification

Jan Menz, Andrew K. Hirsch, Peixuan Li et al.

To ensure programs do not leak private data, we often want to be able to provide formal guarantees ensuring such data is handled correctly. Often, we cannot keep such data secret entirely; instead programmers specify how private data may be declassified. While security definitions for declassification exist, they mostly do not handle higher-order programs. In fact, in the higher-order setting no compositional security definition exists for intensional information-flow properties such as where declassification, which allows declassification in specific parts of a program. We use logical relations to build a model (and thus security definition) of where declassification. The key insight required for our model is that we must stop enforcing indistinguishability once a \emph{relevant declassification} has occurred. We show that the resulting security definition provides more security than the most related previous definition, which is for the lower-order setting. This paper is an extended version of the paper of the same name published at OOPSLA 2023 ([21]).

91.1LOMay 13
First Steps Towards Probabilistic Iris: Harmonizing Independence, Conditioning, and Dynamic Heap Allocation

Janine Lohse, Tim Rohde, Jimmy Xin et al.

There has recently been exciting progress in the realm of *probabilistic separation logics*. An important subclass of these -- including PSL, Lilac, Bluebell, and pcOL -- are *general-purpose probabilistic logics* (or GPLs, for short), meaning that they provide primitive Hoare-style assertions about probability distributions on the program state, along with powerful modularity principles like *independence* and *conditioning*. However, none of these logics support reasoning about dynamically allocated memory (i.e., pointers into a heap), let alone the more sophisticated resource algebra-based ghost state of modern separation logics like Iris. We argue that this is due to a fundamental obstacle: since the shape of memory (and identity of memory locations) may differ under different random outcomes, it is unclear how pointer ownership can be harmonized with probabilistic independence and conditioning. Furthermore, none of the existing GPLs have been mechanized in a proof assistant. In this paper, we take a substantial first step towards a marriage of GPLs and modern separation logics like Iris, in the form of **Amaryllis**. Amaryllis is the first GPL to support independence and conditional reasoning while also handling dynamic memory allocation. To overcome the aforementioned obstacle, we propose a new *indexed valuation*-style model of probabilistic assertions, whereby ownership of a standard Iris-style resource (e.g., heaps) can be promoted to ownership at the level of distributions in a generic fashion. We then show how to adapt the central Iris notions of *frame-preserving update*, *authoritative resource algebras*, and the *weakest precondition modality* to be sound for probabilistic reasoning and validate dynamic allocation. Finally, we have mechanized all our results in the Rocq proof assistant and developed an Iris-based proof mode for conducting proofs within Amaryllis.

LGJul 20, 2025
Rethinking Memorization Measures and their Implications in Large Language Models

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu et al.

Concerned with privacy threats, memorization in LLMs is often seen as undesirable, specifically for learning. In this paper, we study whether memorization can be avoided when optimally learning a language, and whether the privacy threat posed by memorization is exaggerated or not. To this end, we re-examine existing privacy-focused measures of memorization, namely recollection-based and counterfactual memorization, along with a newly proposed contextual memorization. Relating memorization to local over-fitting during learning, contextual memorization aims to disentangle memorization from the contextual learning ability of LLMs. Informally, a string is contextually memorized if its recollection due to training exceeds the optimal contextual recollection, a learned threshold denoting the best contextual learning without training. Conceptually, contextual recollection avoids the fallacy of recollection-based memorization, where any form of high recollection is a sign of memorization. Theoretically, contextual memorization relates to counterfactual memorization, but imposes stronger conditions. Memorization measures differ in outcomes and information requirements. Experimenting on 18 LLMs from 6 families and multiple formal languages of different entropy, we show that (a) memorization measures disagree on memorization order of varying frequent strings, (b) optimal learning of a language cannot avoid partial memorization of training strings, and (c) improved learning decreases contextual and counterfactual memorization but increases recollection-based memorization. Finally, (d) we revisit existing reports of memorized strings by recollection that neither pose a privacy threat nor are contextually or counterfactually memorized.

PLOct 4, 2021
SecurePtrs: Proving Secure Compilation with Data-Flow Back-Translation and Turn-Taking Simulation

Akram El-Korashy, Roberto Blanco, Jérémy Thibault et al.

Proving secure compilation of partial programs typically requires back-translating an attack against the compiled program to an attack against the source program. To prove back-translation, one can syntactically translate the target attacker to a source one -- i.e., syntax-directed back-translation -- or show that the interaction traces of the target attacker can also be emitted by source attackers -- i.e., trace-directed back-translation. Syntax-directed back-translation is not suitable when the target attacker may use unstructured control flow that the source language cannot directly represent. Trace-directed back-translation works with such syntactic dissimilarity because only the external interactions of the target attacker have to be mimicked in the source, not its internal control flow. Revealing only external interactions is, however, inconvenient when sharing memory via unforgeable pointers, since information about shared pointers stashed in private memory is not present on the trace. This made prior proofs unnecessarily complex, since the generated attacker had to instead stash all reachable pointers. In this work, we introduce more informative *data-flow traces*, combining the best of syntax- and trace-directed back-translation in a simpler technique that handles both syntactic dissimilarity and memory sharing well, and that is proved correct in Coq. Additionally, we develop a novel *turn-taking simulation* relation and use it to prove a recomposition lemma, which is key to reusing compiler correctness in such secure compilation proofs. We are the first to mechanize such a recomposition lemma in the presence of memory sharing. We use these two innovations in a secure compilation proof for a code generation compiler pass between a source language with structured control flow and a target language with unstructured control flow, both with safe pointers and components.

CRApr 30, 2021
Isolation Without Taxation: Near Zero Cost Transitions for SFI

Matthew Kolosick, Shravan Narayan, Evan Johnson et al.

Software sandboxing or software-based fault isolation (SFI) is a lightweight approach to building secure systems out of untrusted components. Mozilla, for example, uses SFI to harden the Firefox browser by sandboxing third-party libraries, and companies like Fastly and Cloudflare use SFI to safely co-locate untrusted tenants on their edge clouds. While there have been significant efforts to optimize and verify SFI enforcement, context switching in SFI systems remains largely unexplored: almost all SFI systems use \emph{heavyweight transitions} that are not only error-prone but incur significant performance overhead from saving, clearing, and restoring registers when context switching. We identify a set of \emph{zero-cost conditions} that characterize when sandboxed code has sufficient structured to guarantee security via lightweight \emph{zero-cost} transitions (simple function calls). We modify the Lucet Wasm compiler and its runtime to use zero-cost transitions, eliminating the undue performance tax on systems that rely on Lucet for sandboxing (e.g., we speed up image and font rendering in Firefox by up to 29.7\% and 10\% respectively). To remove the Lucet compiler and its correct implementation of the Wasm specification from the trusted computing base, we (1) develop a \emph{static binary verifier}, VeriZero, which (in seconds) checks that binaries produced by Lucet satisfy our zero-cost conditions, and (2) prove the soundness of VeriZero by developing a logical relation that captures when a compiled Wasm function is semantically well-behaved with respect to our zero-cost conditions. Finally, we show that our model is useful beyond Wasm by describing a new, purpose-built SFI system, SegmentZero32, that uses x86 segmentation and LLVM with mostly off-the-shelf passes to enforce our zero-cost conditions; our prototype performs on-par with the state-of-the-art Native Client SFI system.

CVApr 16, 2021
ScreenSeg: On-Device Screenshot Layout Analysis

Manoj Goyal, Rachit S Munjal, Sukumar Moharana et al.

We propose a novel end-to-end solution that performs a Hierarchical Layout Analysis of screenshots and document images on resource constrained devices like mobilephones. Our approach segments entities like Grid, Image, Text and Icon blocks occurring in a screenshot. We provide an option for smart editing by auto highlighting these entities for saving or sharing. Further this multi-level layout analysis of screenshots has many use cases including content extraction, keyword-based image search, style transfer, etc. We have addressed the limitations of known baseline approaches, supported a wide variety of semantically complex screenshots, and developed an approach which is highly optimized for on-device deployment. In addition, we present a novel weighted NMS technique for filtering object proposals. We achieve an average precision of about 0.95 with a latency of around 200ms on Samsung Galaxy S10 Device for a screenshot of 1080p resolution. The solution pipeline is already commercialized in Samsung Device applications i.e. Samsung Capture, Smart Crop, My Filter in Camera Application, Bixby Touch.

CRNov 16, 2020
Reconciling Security and Utility in Next-Generation Epidemic Risk Mitigation Systems

Pierfrancesco Ingo, Nichole Boufford, Ming Cheng Jiang et al.

Epidemics like the recent COVID-19 require proactive contact tracing and epidemiological analysis to predict and subsequently contain infection transmissions. The proactive measures require large scale data collection, which simultaneously raise concerns regarding users' privacy. Digital contact tracing systems developed in response to COVID-19 either collected extensive data for effective analytics at the cost of users' privacy or collected minimal data for the sake of user privacy but were ineffective in predicting and mitigating the epidemic risks. We present Silmarillion--in preparation for future epidemics--a system that reconciles user's privacy with rich data collection for higher utility. In Silmarillion, user devices record Bluetooth encounters with beacons installed in strategic locations. The beacons further enrich the encounters with geo-location, location type, and environment conditions at the beacon installation site. This enriched information enables detailed scientific analysis of disease parameters as well as more accurate personalized exposure risk notification. At the same time, Silmarillion provides privacy to all participants and non-participants at the same level as that guaranteed in digital and manual contact tracing. We describe the design of Silmarillion and its communication protocols that ensure user privacy and data security. We also evaluate a prototype of Silmarillion built using low-end IoT boards, showing that the power consumption and user latencies are adequately low for a practical deployment. Finally, we briefly report on a small-scale deployment within a university building as a proof-of-concept.

PLMay 12, 2020
CapablePtrs: Securely Compiling Partial Programs Using the Pointers-as-Capabilities Principle

Akram El-Korashy, Stelios Tsampas, Marco Patrignani et al.

Capability machines such as CHERI provide memory capabilities that can be used by compilers to provide security benefits for compiled code (e.g., memory safety). The existing C to CHERI compiler, for example, achieves memory safety by following a principle called "pointers as capabilities" (PAC). Informally, PAC says that a compiler should represent a source language pointer as a machine code capability. But the security properties of PAC compilers are not yet well understood. We show that memory safety is only one aspect, and that PAC compilers can provide significant additional security guarantees for partial programs: the compiler can provide security guarantees for a compilation unit, even if that compilation unit is later linked to attacker-provided machine code. As such, this paper is the first to study the security of PAC compilers for partial programs formally. We prove for a model of such a compiler that it is fully abstract. The proof uses a novel proof technique (dubbed TrICL, read trickle), which should be of broad interest because it reuses the whole-program compiler correctness relation for full abstraction, thus saving work. We also implement our scheme for C on CHERI, show that we can compile legacy C code with minimal changes, and show that the performance overhead of compiled code is roughly proportional to the number of cross-compilation-unit function calls.

CRAug 30, 2019
Pacer: Comprehensive Network Side-Channel Mitigation in the Cloud

Aastha Mehta, Mohamed Alzayat, Roberta de Viti et al.

Network side channels (NSCs) leak secrets through packet timing and packet sizes. They are of particular concern in public IaaS Clouds, where any tenant may be able to colocate and indirectly observe a victim's traffic shape. We present Pacer, the first system that eliminates NSC leaks in public IaaS Clouds end-to-end. It builds on the principled technique of shaping guest traffic outside the guest to make the traffic shape independent of secrets by design. However, Pacer also addresses important concerns that have not been considered in prior work -- it prevents internal side-channel leaks from affecting reshaped traffic, and it respects network flow control, congestion control and loss recovery signals. Pacer is implemented as a paravirtualizing extension to the host hypervisor, requiring modest changes to the hypervisor and the guest kernel, and only optional, minimal changes to applications. We present Pacer's key abstraction of a cloaked tunnel, describe its design and implementation, prove the security of important design aspects through a formal model, and show through an experimental evaluation that Pacer imposes moderate overheads on bandwidth, client latency, and server throughput, while thwarting attacks based on state-of-the-art CNN classifiers.

PLJul 11, 2019
Trace-Relating Compiler Correctness and Secure Compilation

Carmine Abate, Roberto Blanco, Stefan Ciobaca et al.

Compiler correctness is, in its simplest form, defined as the inclusion of the set of traces of the compiled program into the set of traces of the original program, which is equivalent to the preservation of all trace properties. Here traces collect, for instance, the externally observable events of each execution. This definition requires, however, the set of traces of the source and target languages to be exactly the same, which is not the case when the languages are far apart or when observations are fine-grained. To overcome this issue, we study a generalized compiler correctness definition, which uses source and target traces drawn from potentially different sets and connected by an arbitrary relation. We set out to understand what guarantees this generalized compiler correctness definition gives us when instantiated with a non-trivial relation on traces. When this trace relation is not equality, it is no longer possible to preserve the trace properties of the source program unchanged. Instead, we provide a generic characterization of the target trace property ensured by correctly compiling a program that satisfies a given source property, and dually, of the source trace property one is required to show in order to obtain a certain target property for the compiled code. We show that this view on compiler correctness can naturally account for undefined behavior, resource exhaustion, different source and target values, side-channels, and various abstraction mismatches. Finally, we show that the same generalization also applies to many secure compilation definitions, which characterize the protection of a compiled program against linked adversarial code.

CROct 23, 2018
Finding Safety in Numbers with Secure Allegation Escrows

Venkat Arun, Aniket Kate, Deepak Garg et al.

For fear of retribution, the victim of a crime may be willing to report it only if other victims of the same perpetrator also step forward. Common examples include 1) identifying oneself as the victim of sexual harassment, especially by a person in a position of authority or 2) accusing an influential politician, an authoritarian government, or ones own employer of corruption. To handle such situations, legal literature has proposed the concept of an allegation escrow: a neutral third-party that collects allegations anonymously, matches them against each other, and de-anonymizes allegers only after de-anonymity thresholds (in terms of number of co-allegers), pre-specified by the allegers, are reached. An allegation escrow can be realized as a single trusted third party; however, this party must be trusted to keep the identity of the alleger and content of the allegation private. To address this problem, this paper introduces Secure Allegation Escrows (SAE, pronounced "say"). A SAE is a group of parties with independent interests and motives, acting jointly as an escrow for collecting allegations from individuals, matching the allegations, and de-anonymizing the allegations when designated thresholds are reached. By design, SAEs provide a very strong property: No less than a majority of parties constituting a SAE can de-anonymize or disclose the content of an allegation without a sufficient number of matching allegations (even in collusion with any number of other allegers). Once a sufficient number of matching allegations exist, the join escrow discloses the allegation with the allegers' identities. We describe how SAEs can be constructed using a novel authentication protocol and a novel allegation matching and bucketing algorithm, provide formal proofs of the security of our constructions, and evaluate a prototype implementation, demonstrating feasibility in practice.

PLJul 12, 2018
Journey Beyond Full Abstraction: Exploring Robust Property Preservation for Secure Compilation

Carmine Abate, Roberto Blanco, Deepak Garg et al.

(CROPPED TO FIT IN ARXIV'S SILLY LIMIT. SEE PDF FOR COMPLETE ABSTRACT.) We are the first to thoroughly explore a large space of formal secure compilation criteria based on robust property preservation, i.e., the preservation of properties satisfied against arbitrary adversarial contexts. We study robustly preserving various classes of trace properties such as safety, of hyperproperties such as noninterference, and of relational hyperproperties such as trace equivalence. This leads to many new secure compilation criteria, some of which are easier to practically achieve and prove than full abstraction, and some of which provide strictly stronger security guarantees. For each of the studied criteria we propose an equivalent "property-free" characterization that clarifies which proof techniques apply. For relational properties and hyperproperties, which relate the behaviors of multiple programs, our formal definitions of the property classes themselves are novel. We order our criteria by their relative strength and show several collapses and separation results. Finally, we adapt existing proof techniques to show that even the strongest of our secure compilation criteria, the robust preservation of all relational hyperproperties, is achievable for a simple translation from a statically typed to a dynamically typed language.

CRApr 30, 2018
Types for Information Flow Control: Labeling Granularity and Semantic Models

Vineet Rajani, Deepak Garg

Language-based information flow control (IFC) tracks dependencies within a program using sensitivity labels and prohibits public outputs from depending on secret inputs. In particular, literature has proposed several type systems for tracking these dependencies. On one extreme, there are fine-grained type systems (like Flow Caml) that label all values individually and track dependence at the level of individual values. On the other extreme are coarse-grained type systems (like HLIO) that track dependence coarsely, by associating a single label with an entire computation context and not labeling all values individually. In this paper, we show that, despite their glaring differences, both these styles are, in fact, equally expressive. To do this, we show a semantics- and type-preserving translation from a coarse-grained type system to a fine-grained one and vice-versa. The forward translation isn't surprising, but the backward translation is: It requires a construct to arbitrarily limit the scope of a context label in the coarse-grained type system (e.g., HLIO's "toLabeled" construct). As a separate contribution, we show how to extend work on logical relation models of IFC types to higher-order state. We build such logical relations for both the fine-grained type system and the coarse-grained type system. We use these relations to prove the two type systems and our translations between them sound.

CRJan 21, 2018
ERIM: Secure, Efficient In-process Isolation with Memory Protection Keys (MPK)

Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O. Duarte et al.

Isolating sensitive state and data can increase the security and robustness of many applications. Examples include protecting cryptographic keys against exploits like OpenSSL's Heartbleed bug or protecting a language runtime from native libraries written in unsafe languages. When runtime references across isolation boundaries occur relatively infrequently, then conventional page-based hardware isolation can be used, because the cost of kernel- or hypervisor-mediated domain switching is tolerable. However, some applications, such as the isolation of cryptographic session keys in network-facing services, require very frequent domain switching. In such applications, the overhead of kernel- or hypervisor-mediated domain switching is prohibitive. In this paper, we present ERIM, a novel technique that provides hardware-enforced isolation with low overhead on x86 CPUs, even at high switching rates (ERIM's measured overhead is less than 1% for 100,000 switches per second). The key idea is to combine protection keys (MPKs), a feature recently added to x86 that allows protection domain switches in userspace, with binary inspection to prevent circumvention. We show that ERIM can be applied with little effort to new and existing applications, doesn't require compiler changes, can run on a stock Linux kernel, and has low runtime overhead even at high domain switching rates.

CRJan 14, 2018
Shai: Enforcing Data-Specific Policies with Near-Zero Runtime Overhead

Eslam Elnikety, Deepak Garg, Peter Druschel

Data retrieval systems such as online search engines and online social networks must comply with the privacy policies of personal and selectively shared data items, regulatory policies regarding data retention and censorship, and the provider's own policies regarding data use. Enforcing these policies is difficult and error-prone. Systematic techniques to enforce policies are either limited to type-based policies that apply uniformly to all data of the same type, or incur significant runtime overhead. This paper presents Shai, the first system that systematically enforces data-specific policies with near-zero overhead in the common case. Shai's key idea is to push as many policy checks as possible to an offline, ahead-of-time analysis phase, often relying on predicted values of runtime parameters such as the state of access control lists or connected users' attributes. Runtime interception is used sparingly, only to verify these predictions and to make any remaining policy checks. Our prototype implementation relies on efficient, modern OS primitives for sandboxing and isolation. We present the design of Shai and quantify its overheads on an experimental data indexing and search pipeline based on the popular search engine Apache Lucene.

CROct 19, 2017
Robust Hyperproperty Preservation for Secure Compilation (Extended Abstract)

Deepak Garg, Catalin Hritcu, Marco Patrignani et al.

We map the space of soundness criteria for secure compilation based on the preservation of hyperproperties in arbitrary adversarial contexts, which we call robust hyperproperty preservation. For this, we study the preservation of several classes of hyperproperties and for each class we propose an equivalent "property-free" characterization of secure compilation that is generally better tailored for proofs. Even the strongest of our soundness criteria, the robust preservation of all hyperproperties, seems achievable for simple transformations and provable using context back-translation techniques previously developed for showing fully abstract compilation. While proving the robust preservation of hyperproperties that are not safety requires such powerful context back-translation techniques, for preserving safety hyperproperties robustly, translating each finite trace prefix back to a source context seems to suffice.

CRJun 21, 2017
WebPol: Fine-grained Information Flow Policies for Web Browsers

Abhishek Bichhawat, Vineet Rajani, Jinank Jain et al.

In the standard web browser programming model, third-party scripts included in an application execute with the same privilege as the application's own code. This leaves the application's confidential data vulnerable to theft and leakage by malicious code and inadvertent bugs in the third-party scripts. Security mechanisms in modern browsers (the same-origin policy, cross-origin resource sharing and content security policies) are too coarse to suit this programming model. All these mechanisms (and their extensions) describe whether or not a script can access certain data, whereas the meaningful requirement is to allow untrusted scripts access to confidential data that they need and to prevent the scripts from leaking data on the side. Motivated by this gap, we propose WebPol, a policy mechanism that allows a website developer to include fine-grained policies on confidential application data in the familiar syntax of the JavaScript programming language. The policies can be associated with any webpage element, and specify what aspects of the element can be accessed by which third-party domains. A script can access data that the policy allows it to, but it cannot pass the data (or data derived from it) to other scripts or remote hosts in contravention of the policy. To specify the policies, we expose a small set of new native APIs in JavaScript. Our policies can be enforced using any of the numerous existing proposals for information flow tracking in web browsers. We have integrated our policies into one such proposal that we use to evaluate performance overheads and to test our examples.

CRAug 10, 2015
Equivalence-based Security for Querying Encrypted Databases: Theory and Application to Privacy Policy Audits

Omar Chowdhury, Deepak Garg, Limin Jia et al.

Motivated by the problem of simultaneously preserving confidentiality and usability of data outsourced to third-party clouds, we present two different database encryption schemes that largely hide data but reveal enough information to support a wide-range of relational queries. We provide a security definition for database encryption that captures confidentiality based on a notion of equivalence of databases from the adversary's perspective. As a specific application, we adapt an existing algorithm for finding violations of privacy policies to run on logs encrypted under our schemes and observe low to moderate overheads.

CRJun 12, 2015
Generalizing Permissive-Upgrade in Dynamic Information Flow Analysis

Abhishek Bichhawat, Vineet Rajani, Deepak Garg et al.

Preventing implicit information flows by dynamic program analysis requires coarse approximations that result in false positives, because a dynamic monitor sees only the executed trace of the program. One widely deployed method is the no-sensitive-upgrade check, which terminates a program whenever a variable's taint is upgraded (made more sensitive) due to a control dependence on tainted data. Although sound, this method is restrictive, e.g., it terminates the program even if the upgraded variable is never used subsequently. To counter this, Austin and Flanagan introduced the permissive-upgrade check, which allows a variable upgrade due to control dependence, but marks the variable "partially-leaked". The program is stopped later if it tries to use the partially-leaked variable. Permissive-upgrade handles the dead-variable assignment problem and remains sound. However, Austin and Flanagan develop permissive-upgrade only for a two-point (low-high) security lattice and indicate a generalization to pointwise products of such lattices. In this paper, we develop a non-trivial and non-obvious generalization of permissive-upgrade to arbitrary lattices. The key difficulty lies in finding a suitable notion of partial leaks that is both sound and permissive and in developing a suitable definition of memory equivalence that allows an inductive proof of soundness.

CRMay 5, 2015
Program Actions as Actual Causes: A Building Block for Accountability

Anupam Datta, Deepak Garg, Dilsun Kaynar et al.

Protocols for tasks such as authentication, electronic voting, and secure multiparty computation ensure desirable security properties if agents follow their prescribed programs. However, if some agents deviate from their prescribed programs and a security property is violated, it is important to hold agents accountable by determining which deviations actually caused the violation. Motivated by these applications, we initiate a formal study of program actions as actual causes. Specifically, we define in an interacting program model what it means for a set of program actions to be an actual cause of a violation. We present a sound technique for establishing program actions as actual causes. We demonstrate the value of this formalism in two ways. First, we prove that violations of a specific class of safety properties always have an actual cause. Thus, our definition applies to relevant security properties. Second, we provide a cause analysis of a representative protocol designed to address weaknesses in the current public key certification infrastructure.

CRJan 22, 2015
System M: A Program Logic for Code Sandboxing and Identification

Limin Jia, Shayak Sen, Deepak Garg et al.

Security-sensitive applications that execute untrusted code often check the code's integrity by comparing its syntax to a known good value or sandbox the code to contain its effects. System M is a new program logic for reasoning about such security-sensitive applications. System M extends Hoare Type Theory (HTT) to trace safety properties and, additionally, contains two new reasoning principles. First, its type system internalizes logical equality, facilitating reasoning about applications that check code integrity. Second, a confinement rule assigns an effect type to a computation based solely on knowledge of the computation's sandbox. We prove the soundness of system M relative to a step-indexed trace-based semantic model. We illustrate both new reasoning principles of system M by verifying the main integrity property of the design of Memoir, a previously proposed trusted computing system for ensuring state continuity of isolated security-sensitive applications.

CRJan 17, 2014
Information Flow Control in WebKit's JavaScript Bytecode

Abhishek Bichhawat, Vineet Rajani, Deepak Garg et al.

Websites today routinely combine JavaScript from multiple sources, both trusted and untrusted. Hence, JavaScript security is of paramount importance. A specific interesting problem is information flow control (IFC) for JavaScript. In this paper, we develop, formalize and implement a dynamic IFC mechanism for the JavaScript engine of a production Web browser (specifically, Safari's WebKit engine). Our IFC mechanism works at the level of JavaScript bytecode and hence leverages years of industrial effort on optimizing both the source to bytecode compiler and the bytecode interpreter. We track both explicit and implicit flows and observe only moderate overhead. Working with bytecode results in new challenges including the extensive use of unstructured control flow in bytecode (which complicates lowering of program context taints), unstructured exceptions (which complicate the matter further) and the need to make IFC analysis permissive. We explain how we address these challenges, formally model the JavaScript bytecode semantics and our instrumentation, prove the standard property of termination-insensitive non-interference, and present experimental results on an optimized prototype.

IRSep 28, 2012
Information Retrieval on the web and its evaluation

Deepika Sharma, Deepak Garg

Internet is one of the main sources of information for millions of people. One can find information related to practically all matters on internet. Moreover if we want to retrieve information about some particular topic we may find thousands of Web Pages related to that topic. But our main concern is to find relevant Web Pages from among that collection. So in this paper I have discussed that how information is retrieved from the web and the efforts required for retrieving this information in terms of system and users efforts.