LGDec 6, 2022
Straggler-Resilient Differentially-Private Decentralized LearningYauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin et al.
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skipping scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skipping scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.
QUANT-PHMay 23
On Constructing and Decoding Quantum Triorthogonal CodesAlessio Baldelli, Olai Å. Mostad, Hsuan-Yin Lin et al.
A triorthogonal code is a binary quantum Calderbank-Shor-Steane (CSS) code defined by a triorthogonal matrix. Triorthogonal codes are a key ingredient in magic-state distillation, since they allow for transversal $\mathsf{T}$ gates, a non-Clifford logical operation useful for achieving universal fault-tolerant quantum computation. Their construction is challenging because it must satisfy simultaneous pairwise and triple-wise overlap constraints, as well as row-weight requirements. In this work, we study the construction and decoding of triorthogonal codes with prescribed dual-distance properties. We derive an existence criterion for even-weight triorthogonal generator matrices with a target dual minimum distance. The criterion combines triorthogonality constraints with MacWilliams identities via Krawtchouk-polynomial conditions on the dual weight distribution, yielding an integer linear programming formulation for the construction problem. We find new nontrivial triorthogonal codes that are not necessarily generated by classical triply-even codes. The decoding performance of high-distance triorthogonal codes obtained via the doubling construction is then evaluated over the dephasing channel. We compare bounded-distance decoding, belief propagation plus ordered-statistics post-processing, and a GRAND-based decoder adapted to the quantum setting, which turns out to be a promising option.
SINov 2, 2025
Communication-Constrained Private Decentralized Online Personalized Mean EstimationYauhen Yakimenka, Hsuan-Yin Lin, Eirik Rosnes et al.
We consider the problem of communication-constrained collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. A consensus-based algorithm is studied under the framework of differential privacy in order to protect each agent's data. We give a theoretical convergence analysis of the proposed consensus-based algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data, under an oracle decision rule and under some restrictions on the privacy level and the agents' connectivity, which illustrates the benefit of private collaboration in an online setting under a communication restriction on the agents. The theoretical faster-than-local convergence guarantee is backed up by several numerical results.
LGNov 11, 2024
Differentially-Private Collaborative Online Personalized Mean EstimationYauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin et al.
We consider the problem of collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. In particular, we provide a method based on hypothesis testing coupled with differential privacy and data variance estimation. Two privacy mechanisms and two data variance estimation schemes are proposed, and we provide a theoretical convergence analysis of the proposed algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data. Moreover, we provide analytical performance curves for the case with an oracle class estimator, i.e., the class structure of the agents, where agents receiving data from distributions with the same mean are considered to be in the same class, is known. The theoretical faster-than-local convergence guarantee is backed up by extensive numerical results showing that for a considered scenario the proposed approach indeed converges much faster than a fully local approach, and performs comparably to ideal performance where all data is public. This illustrates the benefit of private collaboration in an online setting.
ITApr 1
SynDe: Syndrome-guided Decoding of Raw Nanopore ReadsAnisha Banerjee, Roman Sokolovskii, Thomas Heinis et al.
Nanopore sequencing technology remains highly error-prone, making efficient error correction essential in DNA-based data storage. Prior work addressed high error rates using convolutional codes with their decoder coupled with the basecaller, but such approaches only accommodate a limited number of code classes and incur significant decoding complexity. To overcome these limitations, we propose two algorithms: PrimerSeeker, which efficiently detects primer sequences in raw nanopore sequencing reads, and SynDe, a decoder that operates on the same raw reads and supports any linear error correction code with a low-complexity graphical representation. PrimerSeeker provides primer location estimates close to those of existing approaches while being better suited for real-time primer detection during sequencing. SynDe performs well with convolutional codes augmented with periodic markers, often approaching or exceeding the performance of existing algorithms with a lower time complexity. Remarkably, the confidence scores produced by SynDe reliably identify which of its outputs should be discarded.
ITFeb 28, 2022
Computational Code-Based Privacy in Coded Federated LearningMarvin Xhemrishi, Alexandre Graell i Amat, Eirik Rosnes et al.
We propose a privacy-preserving federated learning (FL) scheme that is resilient against straggling devices. An adaptive scenario is suggested where the slower devices share their data with the faster ones and do not participate in the learning process. The proposed scheme employs code-based cryptography to ensure \emph{computational} privacy of the private data, i.e., no device with bounded computational power can obtain information about the other devices' data in feasible time. For a scenario with 25 devices, the proposed scheme achieves a speed-up of 4.7 and 4 for 92 and 128 bits security, respectively, for an accuracy of 95\% on the MNIST dataset compared with conventional mini-batch FL.
LGDec 16, 2021
CodedPaddedFL and CodedSecAgg: Straggler Mitigation and Secure Aggregation in Federated LearningReent Schlegel, Siddhartha Kumar, Eirik Rosnes et al.
We present two novel federated learning (FL) schemes that mitigate the effect of straggling devices by introducing redundancy on the devices' data across the network. Compared to other schemes in the literature, which deal with stragglers or device dropouts by ignoring their contribution, the proposed schemes do not suffer from the client drift problem. The first scheme, CodedPaddedFL, mitigates the effect of stragglers while retaining the privacy level of conventional FL. It combines one-time padding for user data privacy with gradient codes to yield straggler resiliency. The second scheme, CodedSecAgg, provides straggler resiliency and robustness against model inversion attacks and is based on Shamir's secret sharing. We apply CodedPaddedFL and CodedSecAgg to a classification problem. For a scenario with 120 devices, CodedPaddedFL achieves a speed-up factor of 18 for an accuracy of 95% on the MNIST dataset compared to conventional FL. Furthermore, it yields similar performance in terms of latency compared to a recently proposed scheme by Prakash et al. without the shortcoming of additional leakage of private data. CodedSecAgg outperforms the state-of-the-art secure aggregation scheme LightSecAgg by a speed-up factor of 6.6-18.7 for the MNIST dataset for an accuracy of 95%.
LGSep 30, 2021
Coding for Straggler Mitigation in Federated LearningSiddhartha Kumar, Reent Schlegel, Eirik Rosnes et al.
We present a novel coded federated learning (FL) scheme for linear regression that mitigates the effect of straggling devices while retaining the privacy level of conventional FL. The proposed scheme combines one-time padding to preserve privacy and gradient codes to yield resiliency against stragglers and consists of two phases. In the first phase, the devices share a one-time padded version of their local data with a subset of other devices. In the second phase, the devices and the central server collaboratively and iteratively train a global linear model using gradient codes on the one-time padded local data. To apply one-time padding to real data, our scheme exploits a fixed-point arithmetic representation of the data. Unlike the coded FL scheme recently introduced by Prakash \emph{et al.}, the proposed scheme maintains the same level of privacy as conventional FL while achieving a similar training time. Compared to conventional FL, we show that the proposed scheme achieves a training speed-up factor of $6.6$ and $9.2$ on the MNIST and Fashion-MNIST datasets for an accuracy of $95\%$ and $85\%$, respectively.
LGDec 7, 2020
Generative Adversarial User Privacy in Lossy Single-Server Information RetrievalChung-Wei Weng, Yauhen Yakimenka, Hsuan-Yin Lin et al.
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.
ITMay 14, 2019
Coded Distributed TrackingAlbin Severinson, Eirik Rosnes, Alexandre Graell i Amat
We consider the problem of tracking the state of a process that evolves over time in a distributed setting, with multiple observers each observing parts of the state, which is a fundamental information processing problem with a wide range of applications. We propose a cloud-assisted scheme where the tracking is performed over the cloud. In particular, to provide timely and accurate updates, and alleviate the straggler problem of cloud computing, we propose a coded distributed computing approach where coded observations are distributed over multiple workers. The proposed scheme is based on a coded version of the Kalman filter that operates on data encoded with an erasure correcting code, such that the state can be estimated from partial updates computed by a subset of the workers. We apply the proposed scheme to the problem of tracking multiple vehicles. We show that replication achieves significantly higher accuracy than the corresponding uncoded scheme. The use of maximum distance separable (MDS) codes further improves accuracy for larger update intervals. In both cases, the proposed scheme approaches the accuracy of an ideal centralized scheme when the update interval is large enough. Finally, we observe a trade-off between age-of-information and estimation accuracy for MDS codes.
ITOct 8, 2018
A Droplet Approach Based on Raptor Codes for Distributed Computing With Straggling ServersAlbin Severinson, Alexandre Graell i Amat, Eirik Rosnes et al.
We propose a coded distributed computing scheme based on Raptor codes to address the straggler problem. In particular, we consider a scheme where each server computes intermediate values, referred to as droplets, that are either stored locally or sent over the network. Once enough droplets are collected, the computation can be completed. Compared to previous schemes in the literature, our proposed scheme achieves lower computational delay when the decoding time is taken into account.
ITDec 21, 2017
Block-Diagonal and LT Codes for Distributed Computing With Straggling ServersAlbin Severinson, Alexandre Graell i Amat, Eirik Rosnes
We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding and decoding delay) are identical to those of the scheme recently proposed by Li et al., based on a single, long MDS code. However, due to the use of shorter MDS codes, our scheme yields a significantly lower overall computational delay when the delay incurred by encoding and decoding is also considered. We further propose a second coded scheme based on Luby Transform (LT) codes under inactivation decoding. Interestingly, LT codes may reduce the delay over the partitioned scheme at the expense of an increased communication load. We also consider distributed computing under a deadline and show numerically that the proposed schemes outperform other schemes in the literature, with the LT code-based scheme yielding the best performance for the scenarios considered.