Sreeram Kannan

CR
37papers
3,155citations
Novelty64%
AI Score53

37 Papers

44.5CRJun 2
BigDipper: Sharded Censorship Resistant Data Availability for Leader-Based BFT

Bowen Xue, Samuel Laferriere, Soubhik Deb et al.

Leader-based Byzantine-fault-tolerant (BFT) protocols provide low latency and simple communication structure, but they give the leader short-term control over transaction inclusion. A malicious leader can keep the protocol live while delaying or excluding time-sensitive transactions such as auction bids, oracle updates, liquidations, and bridge messages. Existing responses often build a fixed censorship-resistance, hiding, or ordering mechanism into the protocol path, forcing all transactions to pay for the same protection level. name follows the end-to-end principle: the consensus layer exposes inclusion primitives rather than hardcoding stronger policies. Higher-layer protocols can then choose their own submission strategies and resources, whether through replication, erasure coding, or other mechanisms, to obtain the censorship-resistance, hiding, ordering, or execution guarantees they need. At the core of BigDipper is censorship-resistant data availability, or DA-CR, which certifies available replica-contributed mini-blocks for use by leader-based consensus. A central design goal is that data remains sharded on the consensus critical path: validators do not reconstruct or execute the full payload before voting, but instead check commitments, availability evidence, and the DA-CR inclusion rule. We define DA-CR guarantees for data-tampering resistance, honest mini-block inclusion, and residual leader influence. We then give concrete constructions based on erasure coding and linear commitments, analyze client-tunable transaction submission, and instantiate BigDipper inside HotStuff-2.

CRJan 30
EigenAI: Deterministic Inference, Verifiable Results

David Ribeiro Alves, Vishnu Patankar, Matheus Pereira et al.

EigenAI is a verifiable AI platform built on top of the EigenLayer restaking ecosystem. At a high level, it combines a deterministic large-language model (LLM) inference engine with a cryptoeconomically secured optimistic re-execution protocol so that every inference result can be publicly audited, reproduced, and, if necessary, economically enforced. An untrusted operator runs inference on a fixed GPU architecture, signs and encrypts the request and response, and publishes the encrypted log to EigenDA. During a challenge window, any watcher may request re-execution through EigenVerify; the result is then deterministically recomputed inside a trusted execution environment (TEE) with a threshold-released decryption key, allowing a public challenge with private data. Because inference itself is bit-exact, verification reduces to a byte-equality check, and a single honest replica suffices to detect fraud. We show how this architecture yields sovereign agents -- prediction-market judges, trading bots, and scientific assistants -- that enjoy state-of-the-art performance while inheriting security from Ethereum's validator base.

CRJan 27, 2022Code
Minotaur: Multi-Resource Blockchain Consensus

Matthias Fitzi, Xuechao Wang, Sreeram Kannan et al.

Resource-based consensus is the backbone of permissionless distributed ledger systems. The security of such protocols relies fundamentally on the level of resources actively engaged in the system. The variety of different resources (and related proof protocols, some times referred to as PoX in the literature) raises the fundamental question whether it is possible to utilize many of them in tandem and build multi-resource consensus protocols. The challenge in combining different resources is to achieve fungibility between them, in the sense that security would hold as long as the cumulative adversarial power across all resources is bounded. In this work, we put forth Minotaur, a multi-resource blockchain consensus protocol that combines proof-of-work (PoW) and proof-of-stake (PoS), and we prove it optimally fungible. At the core of our design, Minotaur operates in epochs while continuously sampling the active computational power to provide a fair exchange between the two resources, work and stake. Further, we demonstrate the ability of Minotaur to handle a higher degree of work fluctuation as compared to the Bitcoin blockchain; we also generalize Minotaur to any number of resources. We demonstrate the simplicity of Minotaur via implementing a full stack client in Rust (available open source). We use the client to test the robustness of Minotaur to variable mining power and combined work/stake attacks and demonstrate concrete empirical evidence towards the suitability of Minotaur to serve as the consensus layer of a real-world blockchain.

CROct 14, 2020Code
BFT Protocol Forensics

Peiyao Sheng, Gerui Wang, Kartik Nayak et al.

Byzantine fault-tolerant (BFT) protocols allow a group of replicas to come to a consensus even when some of the replicas are Byzantine faulty. There exist multiple BFT protocols to securely tolerate an optimal number of faults $t$ under different network settings. However, if the number of faults $f$ exceeds $t$ then security could be violated. In this paper we mathematically formalize the study of forensic support of BFT protocols: we aim to identify (with cryptographic integrity) as many of the malicious replicas as possible and in as a distributed manner as possible. Our main result is that forensic support of BFT protocols depends heavily on minor implementation details that do not affect the protocol's security or complexity. Focusing on popular BFT protocols (PBFT, HotStuff, Algorand) we exactly characterize their forensic support, showing that there exist minor variants of each protocol for which the forensic supports vary widely. We show strong forensic support capability of LibraBFT, the consensus protocol of Diem cryptocurrency; our lightweight forensic module implemented on a Diem client is open-sourced and is under active consideration for deployment in Diem. Finally, we show that all secure BFT protocols designed for $2t+1$ replicas communicating over a synchronous network forensic support are inherently nonexistent; this impossibility result holds for all BFT protocols and even if one has access to the states of all replicas (including Byzantine ones).

CRJan 20, 2022
Babylon: Reusing Bitcoin Mining to Enhance Proof-of-Stake Security

Ertem Nusret Tas, David Tse, Fisher Yu et al.

Bitcoin is the most secure blockchain in the world, supported by the immense hash power of its Proof-of-Work miners, but consumes huge amount of energy. Proof-of-Stake chains are energy-efficient, have fast finality and accountability, but face several fundamental security issues: susceptibility to non-slashable long-range safety attacks, non-slashable transaction censorship and stalling attacks and difficulty to bootstrap new PoS chains from low token valuation. We propose Babylon, a blockchain platform which combines the best of both worlds by reusing the immense Bitcoin hash power to enhance the security of PoS chains. Babylon provides a data-available timestamping service, securing PoS chains by allowing them to timestamp data-available block checkpoints, fraud proofs and censored transactions on Babylon. Babylon miners merge mine with Bitcoin and thus the platform has zero additional energy cost. The security of a Babylon-enhanced PoS protocol is formalized by a cryptoeconomic security theorem which shows slashable safety and liveness guarantees.

CRMay 6, 2021
Securing Parallel-chain Protocols under Variable Mining Power

Xuechao Wang, Viswa Virinchi Muppirala, Lei Yang et al.

Several emerging PoW blockchain protocols rely on a "parallel-chain" architecture for scaling, where instead of a single chain, multiple chains are run in parallel and aggregated. A key requirement of practical PoW blockchains is to adapt to mining power variations over time. In this paper, we consider the design of provably secure parallel-chain protocols which can adapt to such mining power variations. The Bitcoin difficulty adjustment rule adjusts the difficulty target of block mining periodically to get a constant mean inter-block time. While superficially simple, the rule has proved itself to be sophisticated and successfully secure, both in practice and in theory. We show that natural adaptations of the Bitcoin adjustment rule to the parallel-chain case open the door to subtle, but catastrophic safety and liveness breaches. We uncover a meta-design principle that allow us to design variable mining difficulty protocols for three popular PoW blockchain proposals (Prism, OHIE, and Fruitchains) inside a common rubric. The principle has three components:(M1) a pivot chain, based on which blocks in all chains choose difficulty, (M2) a monotonicity condition for referencing pivot chain blocks and (M3) translating additional protocol aspects from using levels (depth) to using "difficulty levels". We show that protocols employing a subset of these principles may have catastrophic failures. The security of the designs is also proved using a common rubric - the key technical challenge involves analyzing the interaction between the pivot chain and the other chains, as well as bounding the sudden changes in difficulty target experienced in non-pivot chains. We empirically investigate the responsivity of the new mining difficulty rule via simulations based on historical Bitcoin data, and find that the protocol very effectively controls the forking rate across all the chains.

CROct 30, 2020
ACeD: Scalable Data Availability Oracle

Peiyao Sheng, Bowen Xue, Sreeram Kannan et al.

A popular method in practice offloads computation and storage in blockchains by relying on committing only hashes of off-chain data into the blockchain. This mechanism is acknowledged to be vulnerable to a stalling attack: the blocks corresponding to the committed hashes may be unavailable at any honest node. The straightforward solution of broadcasting all blocks to the entire network sidesteps this data availability attack, but it is not scalable. In this paper, we propose ACeD, a scalable solution to this data availability problem with $O(1)$ communication efficiency, the first to the best of our knowledge. The key innovation is a new protocol that requires each of the $N$ nodes to receive only $O(1/N)$ of the block, such that the data is guaranteed to be available in a distributed manner in the network. Our solution creatively integrates coding-theoretic designs inside of Merkle tree commitments to guarantee efficient and tamper-proof reconstruction; this solution is distinct from Asynchronous Verifiable Information Dispersal (in guaranteeing efficient proofs of malformed coding) and Coded Merkle Tree (which only provides guarantees for random corruption as opposed to our guarantees for worst-case corruption). We implement ACeD with full functionality in 6000 lines of Rust code, integrate the functionality as a smart contract into Ethereum via a high-performance implementation demonstrating up to 10,000 transactions per second in throughput and 6000x reduction in gas cost on the Ethereum testnet Kovan.

CROct 26, 2020
Blockchain CAP Theorem Allows User-Dependent Adaptivity and Finality

Suryanarayana Sankagiri, Xuechao Wang, Sreeram Kannan et al.

Longest-chain blockchain protocols, such as Bitcoin, guarantee liveness even when the number of actively participating users is variable, i.e., they are adaptive. However, they are not safe under network partitions, i.e., they do not guarantee finality. On the other hand, classical blockchain protocols, like PBFT, achieve finality but not adaptivity. Indeed, the CAP theorem in the context of blockchains asserts that no protocol can simultaneously offer both adaptivity and finality. We propose a new blockchain protocol, called the checkpointed longest chain, that offers individual users the choice between finality and adaptivity instead of imposing it at a system-wide level. This protocol's salient feature is that it supports two distinct confirmation rules: one that guarantees adaptivity and the other finality. The more optimistic adaptive rule always confirms blocks that are marked as finalized by the more conservative rule, and may possibly confirm more blocks during variable participation levels. Clients (users) make a local choice between the confirmation rules as per their personal preference, while miners follow a fixed block proposal rule that is consistent with both confirmation rules. The proposed protocol has the additional benefit of intrinsic validity: the finalized blocks always lie on a single blockchain, and therefore miners can attest to the validity of transactions while proposing blocks. Our protocol builds on the notion of a finality gadget, a popular technique for adding finality to longest-chain protocols.

CROct 15, 2020
PoSAT: Proof-of-Work Availability and Unpredictability, without the Work

Soubhik Deb, Sreeram Kannan, David Tse

An important feature of Proof-of-Work (PoW) blockchains is full dynamic availability, allowing miners to go online and offline while requiring only 50% of the online miners to be honest. Existing Proof-of-stake (PoS), Proof-of-Space and related protocols are able to achieve this property only partially, either putting the additional assumption that adversary nodes to be online from the beginning and no new adversary nodes come online afterwards, or use additional trust assumptions for newly joining nodes.We propose a new PoS protocol PoSAT which can provably achieve dynamic availability fully without any additional assumptions. The protocol is based on the longest chain and uses a Verifiable Delay Function for the block proposal lottery to provide an arrow of time. The security analysis of the protocol draws on the recently proposed technique of Nakamoto blocks as well as the theory of branching random walks. An additional feature of PoSAT is the complete unpredictability of who will get to propose a block next, even by the winner itself. This unpredictability is at the same level of PoW protocols, and is stronger than that of existing PoS protocols using Verifiable Random Functions.

ITAug 18, 2020
Deepcode and Modulo-SK are Designed for Different Settings

Hyeji Kim, Yihan Jiang, Sreeram Kannan et al.

We respond to [1] which claimed that "Modulo-SK scheme outperforms Deepcode [2]". We demonstrate that this statement is not true: the two schemes are designed and evaluated for entirely different settings. DeepCode is designed and evaluated for the AWGN channel with (potentially delayed) uncoded output feedback. Modulo-SK is evaluated on the AWGN channel with coded feedback and unit delay. [1] also claimed an implementation of Schalkwijk and Kailath (SK) [3] which was numerically stable for any number of information bits and iterations. However, we observe that while their implementation does marginally improve over ours, it also suffers from a fundamental issue with precision. Finally, we show that Deepcode dominates the optimized performance of SK, over a natural choice of parameterizations when the feedback is noisy.

CRMay 21, 2020
Everything is a Race and Nakamoto Always Wins

Amir Dembo, Sreeram Kannan, Ertem Nusret Tas et al.

Nakamoto invented the longest chain protocol, and claimed its security by analyzing the private double-spend attack, a race between the adversary and the honest nodes to grow a longer chain. But is it the worst attack? We answer the question in the affirmative for three classes of longest chain protocols, designed for different consensus models: 1) Nakamoto's original Proof-of-Work protocol; 2) Ouroboros and SnowWhite Proof-of-Stake protocols; 3) Chia Proof-of-Space protocol. As a consequence, exact characterization of the maximum tolerable adversary power is obtained for each protocol as a function of the average block time normalized by the network delay. The security analysis of these protocols is performed in a unified manner by a novel method of reducing all attacks to a race between the adversary and the honest nodes.

CRMay 19, 2020
Free2Shard: Adaptive-adversary-resistant sharding via Dynamic Self Allocation

Ranvir Rana, Sreeram Kannan, David Tse et al.

Propelled by the growth of large-scale blockchain deployments, much recent progress has been made in designing sharding protocols that achieve throughput scaling linearly in the number of nodes. However, existing protocols are not robust to an adversary adaptively corrupting a fixed fraction of nodes. In this paper, we propose Free2Shard -- a new architecture that achieves near-linear scaling while being secure against a fully adaptive adversary. The focal point of this architecture is a dynamic self-allocation algorithm that lets users allocate themselves to shards in response to adversarial action, without requiring a central or cryptographic proof. This architecture has several attractive features unusual for sharding protocols, including: (a) the ability to handle the regime of large number of shards (relative to the number of nodes); (b) heterogeneous shard demands; (c) requiring only a small minority to follow the self-allocation; (d) asynchronous shard rotation; (e) operation in a purely identity-free proof-of-work setting. The key technical contribution is a deep mathematical connection to the classical work of Blackwell in dynamic game theory.

LGMay 17, 2020
C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation

Arnab Kumar Mondal, Arnab Bhattacharya, Sudipto Mukherjee et al.

Estimation of information theoretic quantities such as mutual information and its conditional variant has drawn interest in recent times owing to their multifaceted applications. Newly proposed neural estimators for these quantities have overcome severe drawbacks of classical $k$NN-based estimators in high dimensions. In this work, we focus on conditional mutual information (CMI) estimation by utilizing its formulation as a minmax optimization problem. Such a formulation leads to a joint training procedure similar to that of generative adversarial networks. We find that our proposed estimator provides better estimates than the existing approaches on a variety of simulated data sets comprising linear and non-linear relations between variables. As an application of CMI estimation, we deploy our estimator for conditional independence (CI) testing on real data and obtain better results than state-of-the-art CI testers.

ITNov 8, 2019
Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels

Yihan Jiang, Hyeji Kim, Himanshu Asnani et al.

Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: ($a$) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; ($b$) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance.

CROct 5, 2019
Proof-of-Stake Longest Chain Protocols: Security vs Predictability

Vivek Bagaria, Amir Dembo, Sreeram Kannan et al.

The Nakamoto longest chain protocol is remarkably simple and has been proven to provide security against any adversary with less than 50% of the total hashing power. Proof-of-stake (PoS) protocols are an energy efficient alternative; however existing protocols adopting Nakamoto's longest chain design achieve provable security only by allowing long-term predictability (which have serious security implications). In this paper, we prove that a natural longest chain PoS protocol with similar predictability as Nakamoto's PoW protocol can achieve security against any adversary with less than 1/(1+e) fraction of the total stake. Moreover we propose a new family of longest chain PoS protocols that achieve security against a 50% adversary, while only requiring short-term predictability. Our proofs present a new approach to analyzing the formal security of blockchains, based on a notion of adversary-proof convergence.

CROct 2, 2019
Coded Merkle Tree: Solving Data Availability Attacks in Blockchains

Mingchao Yu, Saeid Sahraei, Songze Li et al.

In this paper, we propose coded Merkle tree (CMT), a novel hash accumulator that offers a constant-cost protection against data availability attacks in blockchains, even if the majority of the network nodes are malicious. A CMT is constructed using a family of sparse erasure codes on each layer, and is recovered by iteratively applying a peeling-decoding technique that enables a compact proof for data availability attack on any layer. Our algorithm enables any node to verify the full availability of any data block generated by the system by just downloading a $Θ(1)$ byte block hash commitment and randomly sampling $Θ(\log b)$ bytes, where $b$ is the size of the data block. With the help of only one connected honest node in the system, our method also allows any node to verify any tampering of the coded Merkle tree by just downloading $Θ(\log b)$ bytes. We provide a modular library for CMT in Rust and Python and demonstrate its efficacy inside the Parity Bitcoin client.

LGSep 27, 2019
Improving Federated Learning Personalization via Model Agnostic Meta Learning

Yihan Jiang, Jakub Konečný, Keith Rush et al.

Federated Learning (FL) refers to learning a high quality global model based on decentralized data storage, without ever copying the raw data. A natural scenario arises with data created on mobile phones by the activity of their users. Given the typical data heterogeneity in such situations, it is natural to ask how can the global model be personalized for every such device, individually. In this work, we point out that the setting of Model Agnostic Meta Learning (MAML), where one optimizes for a fast, gradient-based, few-shot adaptation to a heterogeneous distribution of tasks, has a number of similarities with the objective of personalization for FL. We present FL as a natural source of practical applications for MAML algorithms, and make the following observations. 1) The popular FL algorithm, Federated Averaging, can be interpreted as a meta learning algorithm. 2) Careful fine-tuning can yield a global model with higher accuracy, which is at the same time easier to personalize. However, solely optimizing for the global model accuracy yields a weaker personalization result. 3) A model trained using a standard datacenter optimization method is much harder to personalize, compared to one trained using Federated Averaging, supporting the first claim. These results raise new questions for FL, MAML, and broader ML research.

LGJul 28, 2019
Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples

Hossein Hosseini, Sreeram Kannan, Radha Poovendran

Deep learning classifiers are known to be vulnerable to adversarial examples. A recent paper presented at ICML 2019 proposed a statistical test detection method based on the observation that logits of noisy adversarial examples are biased toward the true class. The method is evaluated on CIFAR-10 dataset and is shown to achieve 99% true positive rate (TPR) at only 1% false positive rate (FPR). In this paper, we first develop a classifier-based adaptation of the statistical test method and show that it improves the detection performance. We then propose Logit Mimicry Attack method to generate adversarial examples such that their logits mimic those of benign images. We show that our attack bypasses both statistical test and classifier-based methods, reducing their TPR to less than 2:2% and 1:6%, respectively, even at 5% FPR. We finally show that a classifier-based detector that is trained with logits of mimicry adversarial examples can be evaded by an adaptive attacker that specifically targets the detector. Furthermore, even a detector that is iteratively trained to defend against adaptive attacker cannot be made robust, indicating that statistics of logits cannot be used to detect adversarial examples.

ITJun 26, 2019
Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

Songze Li, Saeid Sahraei, Mingchao Yu et al.

We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that computational efficiency can scale almost linearly with the network size, without compromising on security.

LGJun 6, 2019
Learning in Gated Neural Networks

Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan et al.

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models. In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two {\em distinct} loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.

LGJun 5, 2019
CCMI : Classifier based Conditional Mutual Information Estimation

Sudipto Mukherjee, Himanshu Asnani, Sreeram Kannan

Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.

LGMay 1, 2019
Dropping Pixels for Adversarial Robustness

Hossein Hosseini, Sreeram Kannan, Radha Poovendran

Deep neural networks are vulnerable against adversarial examples. In this paper, we propose to train and test the networks with randomly subsampled images with high drop rates. We show that this approach significantly improves robustness against adversarial examples in all cases of bounded L0, L2 and L_inf perturbations, while reducing the standard accuracy by a small value. We argue that subsampling pixels can be thought to provide a set of robust features for the input image and, thus, improves robustness without performing adversarial training.

SPNov 30, 2018
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks

Yihan Jiang, Hyeji Kim, Himanshu Asnani et al.

Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate. Tail-biting convolutional codes and other recent state-of-the-art short block codes, while promising reduced latency, are neither robust to channel-mismatch nor adaptive to varying channel conditions. When the codes designed for one channel (e.g.,~Additive White Gaussian Noise (AWGN) channel) are used for another (e.g.,~non-AWGN channels), heuristics are necessary to achieve non-trivial performance. In this paper, we first propose an end-to-end learned neural code, obtained by jointly designing a Recurrent Neural Network (RNN) based encoder and decoder. This code outperforms canonical convolutional code under block settings. We then leverage this experience to propose a new class of codes under low-latency constraints, which we call Low-latency Efficient Adaptive Robust Neural (LEARN) codes. These codes outperform state-of-the-art low-latency codes and exhibit robustness and adaptivity properties. LEARN codes show the potential to design new versatile and universal codes for future communications via tools of modern deep learning coupled with communication engineering insights.

CROct 18, 2018
Deconstructing the Blockchain to Approach Physical Limits

Vivek Bagaria, Sreeram Kannan, David Tse et al.

Transaction throughput, confirmation latency and confirmation reliability are fundamental performance measures of any blockchain system in addition to its security. In a decentralized setting, these measures are limited by two underlying physical network attributes: communication capacity and speed-of-light propagation delay. Existing systems operate far away from these physical limits. In this work we introduce Prism, a new proof-of-work blockchain protocol, which can achieve 1) security against up to 50% adversarial hashing power; 2) optimal throughput up to the capacity C of the network; 3) confirmation latency for honest transactions proportional to the propagation delay D, with confirmation error probability exponentially small in CD ; 4) eventual total ordering of all transactions. Our approach to the design of this protocol is based on deconstructing the blockchain into its basic functionalities and systematically scaling up these functionalities to approach their physical limits.

CRSep 27, 2018
PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously

Songze Li, Mingchao Yu, Chien-Sheng Yang et al.

Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard: ``polynomially coded sharding'' scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.

LGSep 10, 2018
ClusterGAN : Latent Space Clustering in Generative Adversarial Networks

Sudipto Mukherjee, Himanshu Asnani, Eugene Lin et al.

Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.

LGJul 2, 2018
Deepcode: Feedback Codes via Deep Learning

Hyeji Kim, Yihan Jiang, Sreeram Kannan et al.

The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications. In this work, we present the first family of codes obtained via deep learning, which significantly beats state-of-the-art codes designed over several decades of research. The communication channel under consideration is the Gaussian noise channel with feedback, whose study was initiated by Shannon; feedback is known theoretically to improve reliability of communication, but no practical codes that do so have ever been successfully constructed. We break this logjam by integrating information theoretic insights harmoniously with recurrent-neural-network based encoders and decoders to create novel codes that outperform known codes by 3 orders of magnitude in reliability. We also demonstrate several desirable properties of the codes: (a) generalization to larger block lengths, (b) composability with known codes, (c) adaptation to practical constraints. This result also has broader ramifications for coding theory: even when the channel has a clear mathematical model, deep learning methodologies, when combined with channel-specific information-theoretic insights, can potentially beat state-of-the-art codes constructed over decades of mathematical research.

MLJun 25, 2018
Mimic and Classify : A meta-algorithm for Conditional Independence Testing

Rajat Sen, Karthikeyan Shanmugam, Himanshu Asnani et al.

Given independent samples generated from the joint distribution $p(\mathbf{x},\mathbf{y},\mathbf{z})$, we study the problem of Conditional Independence (CI-Testing), i.e., whether the joint equals the CI distribution $p^{CI}(\mathbf{x},\mathbf{y},\mathbf{z})= p(\mathbf{z}) p(\mathbf{y}|\mathbf{z})p(\mathbf{x}|\mathbf{z})$ or not. We cast this problem under the purview of the proposed, provable meta-algorithm, "Mimic and Classify", which is realized in two-steps: (a) Mimic the CI distribution close enough to recover the support, and (b) Classify to distinguish the joint and the CI distribution. Thus, as long as we have a good generative model and a good classifier, we potentially have a sound CI Tester. With this modular paradigm, CI Testing becomes amiable to be handled by state-of-the-art, both generative and classification methods from the modern advances in Deep Learning, which in general can handle issues related to curse of dimensionality and operation in small sample regime. We show intensive numerical experiments on synthetic and real datasets where new mimic methods such conditional GANs, Regression with Neural Nets, outperform the current best CI Testing performance in the literature. Our theoretical results provide analysis on the estimation of null distribution as well as allow for general measures, i.e., when either some of the random variables are discrete and some are continuous or when one or more of them are discrete-continuous mixtures.

MLMay 23, 2018
Communication Algorithms via Deep Learning

Hyeji Kim, Yihan Jiang, Ranvir Rana et al.

Coding theory is a central discipline underpinning wireline and wireless modems that are the workhorses of the information age. Progress in coding theory is largely driven by individual human ingenuity with sporadic breakthroughs over the past century. In this paper we study whether it is possible to automate the discovery of decoding algorithms via deep learning. We study a family of sequential codes parameterized by recurrent neural network (RNN) architectures. We show that creatively designed and trained RNN architectures can decode well known sequential codes such as the convolutional and turbo codes with close to optimal performance on the additive white Gaussian noise (AWGN) channel, which itself is achieved by breakthrough algorithms of our times (Viterbi and BCJR decoders, representing dynamic programing and forward-backward algorithms). We show strong generalizations, i.e., we train at a specific signal to noise ratio and block length but test at a wide range of these quantities, as well as robustness and adaptivity to deviations from the AWGN setting.

LGFeb 21, 2018
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms

Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan et al.

Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE including the EM algorithm, and gradient descent are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating paramters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.

ITOct 13, 2017
Potential Conditional Mutual Information: Estimators, Properties and Applications

Arman Rahimzamani, Sreeram Kannan

The conditional mutual information I(X;Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution p_{Y|X,Z} rather than of the joint distribution p_{X,Y,Z}. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution p_{Y|X,Z} q_{X,Z}, where q_{X,Z} is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.

ITSep 19, 2017
Estimating Mutual Information for Discrete-Continuous Mixtures

Weihao Gao, Sreeram Kannan, Sewoong Oh et al.

Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. The main challenge is that these methods first estimate the (differential) entropies of X, Y and the pair (X;Y) and add them up with appropriate signs to get an estimate of the mutual information. These 3H-estimators cannot be applied in general mixture spaces, where entropy is not well-defined. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.

MLSep 12, 2017
Discovering Potential Correlations via Hypercontractivity

Hyeji Kim, Weihao Gao, Sreeram Kannan et al.

Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.

LGMar 13, 2017
Blocking Transferability of Adversarial Examples in Black-Box Learning Systems

Hossein Hosseini, Yize Chen, Sreeram Kannan et al.

Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples: inputs that are maliciously modified can cause the classifier to provide adversary-desired outputs. Moreover, it is known that adversarial examples generated on one classifier are likely to cause another classifier to make the same mistake, even if the classifiers have different architectures or are trained on disjoint datasets. This property, which is known as transferability, opens up the possibility of attacking black-box systems by generating adversarial examples on a substitute classifier and transferring the examples to the target classifier. Therefore, the key to protect black-box learning systems against the adversarial examples is to block their transferability. To this end, we propose a training method that, as the input is more perturbed, the classifier smoothly outputs lower confidence on the original label and instead predicts that the input is "invalid". In essence, we augment the output class set with a NULL label and train the classifier to reject the adversarial examples by classifying them as NULL. In experiments, we apply a wide range of attacks based on adversarial examples on the black-box systems. We show that a classifier trained with the proposed method effectively resists against the adversarial examples, while maintaining the accuracy on clean data.

LGFeb 27, 2017
Deceiving Google's Perspective API Built for Detecting Toxic Comments

Hossein Hosseini, Sreeram Kannan, Baosen Zhang et al.

Social media platforms provide an environment where people can freely engage in discussions. Unfortunately, they also enable several problems, such as online harassment. Recently, Google and Jigsaw started a project called Perspective, which uses machine learning to automatically detect toxic language. A demonstration website has been also launched, which allows anyone to type a phrase in the interface and instantaneously see the toxicity score [1]. In this paper, we propose an attack on the Perspective toxic detection system based on the adversarial examples. We show that an adversary can subtly modify a highly toxic phrase in a way that the system assigns significantly lower toxicity score to it. We apply the attack on the sample phrases provided in the Perspective website and show that we can consistently reduce the toxicity scores to the level of the non-toxic phrases. The existence of such adversarial examples is very harmful for toxic detection systems and seriously undermines their usability.

LGAug 27, 2016
Learning Temporal Dependence from Time-Series Data with Latent Variables

Hossein Hosseini, Sreeram Kannan, Baosen Zhang et al.

We consider the setting where a collection of time series, modeled as random processes, evolve in a causal manner, and one is interested in learning the graph governing the relationships of these processes. A special case of wide interest and applicability is the setting where the noise is Gaussian and relationships are Markov and linear. We study this setting with two additional features: firstly, each random process has a hidden (latent) state, which we use to model the internal memory possessed by the variables (similar to hidden Markov models). Secondly, each variable can depend on its latent memory state through a random lag (rather than a fixed lag), thus modeling memory recall with differing lags at distinct times. Under this setting, we develop an estimator and prove that under a genericity assumption, the parameters of the model can be learned consistently. We also propose a practical adaption of this estimator, which demonstrates significant performance gains in both synthetic and real-world datasets.

ITFeb 10, 2016
Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications

Weihao Gao, Sreeram Kannan, Sewoong Oh et al.

We conduct an axiomatic study of the problem of estimating the strength of a known causal relationship between a pair of variables. We propose that an estimate of causal strength should be based on the conditional distribution of the effect given the cause (and not on the driving distribution of the cause), and study dependence measures on conditional distributions. Shannon capacity, appropriately regularized, emerges as a natural measure under these axioms. We examine the problem of calculating Shannon capacity from the observed samples and propose a novel fixed-$k$ nearest neighbor estimator, and demonstrate its consistency. Finally, we demonstrate an application to single-cell flow-cytometry, where the proposed estimators significantly reduce sample complexity.