Mingchao Yu

7papers

399citations

Novelty66%

AI Score47

Ranked #57,893 of 201,326 authors (top 29%)#1,231 in CR (top 17%)

7 Papers

39.6CRMay 18

Bitcoin Staking

Xinshu Dong, Orfeas Stefanos Thyfronitis Litos, Ertem Nusret Tas et al.

The idea of security sharing goes back to Nakamoto's introduction of merge mining, a technique that enables Bitcoin miners to reuse their hash power to bootstrap and secure other Proof-of-Work (PoW) blockchains. However, with the rise of Proof-of-Stake (PoS) chains, there is a need for new methods of Bitcoin security sharing. We introduce Bitcoin staking, a protocol that allows Bitcoin holders to trustlessly use their idle asset to secure a PoS chain. The key challenge is to enable automatic slashing of bitcoins on the Bitcoin chain upon safety violations on the PoS chain. We achieve this using double-authentication-preventing signatures, finality gadgets and bi-directional timestamping between Bitcoin and the PoS chain. Our design is entirely modular and can be integrated with any PoS chain. A version of this protocol was deployed to secure the Babylon mainnet in April 2025 and currently has over 58,000 bitcoins staked (about 4 billion USD at current prices) while paying only 0.05% APR reward to the stakers. This is 2 orders of magnitude cheaper security cost than in PoS chains secured by their native token.

DCOct 22, 2019

Train Where the Data is: A Case for Bandwidth Efficient Coded Training

Zhifeng Lin, Krishna Giri Narra, Mingchao Yu et al.

Training a machine learning model is both compute and data-intensive. Most of the model training is performed on high performance compute nodes and the training data is stored near these nodes for faster training. But there is a growing interest in enabling training near the data. For instance, mobile devices are rich sources of training data. It may not be feasible to consolidate the data from mobile devices into a cloud service, due to bandwidth and data privacy reasons. Training at mobile devices is however fraught with challenges. First mobile devices may join or leave the distributed setting, either voluntarily or due to environmental uncertainties, such as lack of power. Tolerating uncertainties is critical to the success of distributed mobile training. One proactive approach to tolerate computational uncertainty is to store data in a coded format and perform training on coded data. Encoding data is a challenging task since erasure codes require multiple devices to exchange their data to create a coded data partition, which places a significant bandwidth constraint. Furthermore, coded computing traditionally relied on a central node to encode and distribute data to all the worker nodes, which is not practical in a distributed mobile setting. In this paper, we tackle the uncertainty in distributed mobile training using a bandwidth-efficient encoding strategy. We use a Random Linear Network coding (RLNC) which reduces the need to exchange data partitions across all participating mobile devices, while at the same time preserving the property of coded computing to tolerate uncertainties. We implement gradient descent for logistic regression and SVM to evaluate the effectiveness of our mobile training framework. We demonstrate a 50% reduction in total required communication bandwidth compared to MDS coded computation, one of the popular erasure codes.

CROct 2, 2019

Coded Merkle Tree: Solving Data Availability Attacks in Blockchains

Mingchao Yu, Saeid Sahraei, Songze Li et al.

In this paper, we propose coded Merkle tree (CMT), a novel hash accumulator that offers a constant-cost protection against data availability attacks in blockchains, even if the majority of the network nodes are malicious. A CMT is constructed using a family of sparse erasure codes on each layer, and is recovered by iteratively applying a peeling-decoding technique that enables a compact proof for data availability attack on any layer. Our algorithm enables any node to verify the full availability of any data block generated by the system by just downloading a $Θ(1)$ byte block hash commitment and randomly sampling $Θ(\log b)$ bytes, where $b$ is the size of the data block. With the help of only one connected honest node in the system, our method also allows any node to verify any tampering of the coded Merkle tree by just downloading $Θ(\log b)$ bytes. We provide a modular library for CMT in Rust and Python and demonstrate its efficacy inside the Parity Bitcoin client.

ITJun 26, 2019

Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

Songze Li, Saeid Sahraei, Mingchao Yu et al.

We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that computational efficiency can scale almost linearly with the network size, without compromising on security.

LGNov 8, 2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Youjie Li, Mingchao Yu, Songze Li et al.

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches.

LGNov 8, 2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Mingchao Yu, Zhifeng Lin, Krishna Narra et al.

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiVeQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiVeQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiVeQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5X without noticeable accuracy loss, and reduces the end-to-end training time by almost 50%. The results also show that GradiVeQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.

CRSep 27, 2018

PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously

Songze Li, Mingchao Yu, Chien-Sheng Yang et al.

Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard: ``polynomially coded sharding'' scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.