Mikael Skoglund

LG
h-index39
67papers
929citations
Novelty45%
AI Score55

67 Papers

LGDec 27, 2022
Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben et al. · utoronto

To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.

LGJun 16, 2023
Towards Quantum Federated Learning

Chao Ren, Rudai Yan, Huihui Zhu et al.

Quantum Federated Learning (QFL) is an emerging interdisciplinary field that merges the principles of Quantum Computing (QC) and Federated Learning (FL), with the goal of leveraging quantum technologies to enhance privacy, security, and efficiency in the learning process. Currently, there is no comprehensive survey for this interdisciplinary field. This review offers a thorough, holistic examination of QFL. We aim to provide a comprehensive understanding of the principles, techniques, and emerging applications of QFL. We discuss the current state of research in this rapidly evolving field, identify challenges and opportunities associated with integrating these technologies, and outline future directions and open research questions. We propose a unique taxonomy of QFL techniques, categorized according to their characteristics and the quantum techniques employed. As the field of QFL continues to progress, we can anticipate further breakthroughs and applications across various industries, driving innovation and addressing challenges related to data privacy, security, and resource optimization. This review serves as a first-of-its-kind comprehensive guide for researchers and practitioners interested in understanding and advancing the field of QFL.

SYMay 1, 2016
Uncertain Wiretap Channels and Secure Estimation

Moritz Wiese, Karl Henrik Johansson, Tobias J. Oechtering et al.

Uncertain wiretap channels are introduced. Their zero-error secrecy capacity is defined. If the sensor-estimator channel is perfect, it is also calculated. Further properties are discussed. The problem of estimating a dynamical system with nonstochastic disturbances is studied where the sensor is connected to the estimator and an eavesdropper via an uncertain wiretap channel. The estimator should obtain a uniformly bounded estimation error whereas the eavesdropper's error should tend to infinity. It is proved that the system can be estimated securely if the zero-error capacity of the sensor-estimator channel is strictly larger than the logarithm of the system's unstable pole and the zero-error secrecy capacity of the uncertain wiretap channel is positive.

63.5ITMay 25
Secure Spatial Signal Design for ISAC in a Cell-Free MIMO Network

Steven Rivetti, Emil Bjornson, Mikael Skoglund

In this paper, we study a cell-free multiple-input multiple-output network equipped with integrated sensing and communication (ISAC) access points (APs). The distributed APs are used to jointly serve the communication needs of user equipments (UEs) while sensing a target, assumed to be an eavesdropper (Eve). To increase the system's robustness towards said Eve, we develop an ISAC waveform model that includes artificial noise (AN) aimed at degrading the Eve channel quality. The central processing unit receives the observations from each AP and calculates the optimal precoding and AN covariance matrices by solving a semi-definite relaxation of a constrained Cramer-Rao bound (CRB) minimization problem. Simulation results highlight an underlying trade-off between sensing and communication performances: in particular, the UEs signal-to-noise and interference ratio and the maximum Eve's signal to noise ratio are directly proportional to the CRB. Furthermore, the optimal AN covariance matrix is rank-1 and has a peak in the eve's direction, leading to a surprising inverse-proportionality between the UEs-Eve distance and optimal-CRB magnitude.

SYNov 8, 2011
Projection-Based and Look Ahead Strategies for Atom Selection

Saikat Chatterjee, Dennis Sundman, Mikko Vehkaperä et al.

In this paper, we improve iterative greedy search algorithms in which atoms are selected serially over iterations, i.e., one-by-one over iterations. For serial atom selection, we devise two new schemes to select an atom from a set of potential atoms in each iteration. The two new schemes lead to two new algorithms. For both the algorithms, in each iteration, the set of potential atoms is found using a standard matched filter. In case of the first scheme, we propose an orthogonal projection strategy that selects an atom from the set of potential atoms. Then, for the second scheme, we propose a look ahead strategy such that the selection of an atom in the current iteration has an effect on the future iterations. The use of look ahead strategy requires a higher computational resource. To achieve a trade-off between performance and complexity, we use the two new schemes in cascade and develop a third new algorithm. Through experimental evaluations, we compare the proposed algorithms with existing greedy search and convex relaxation algorithms.

MLJun 21, 2023
More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.

LGFeb 13
Quantization-Aware Collaborative Inference for Large Embodied AI Models

Zhonghao Lyu, Ming Xiao, Mikael Skoglund et al.

Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications. However, the massive parameter scale and computational demands of LAIMs pose significant challenges for resource-limited embodied agents. To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems. First, we develop a tractable approximation for quantization-induced inference distortion. Based on this approximation, we derive lower and upper bounds on the quantization rate-inference distortion function, characterizing its dependence on LAIM statistics, including the quantization bit-width. Next, we formulate a joint quantization bit-width and computation frequency design problem under delay and energy constraints, aiming to minimize the distortion upper bound while ensuring tightness through the corresponding lower bound. Extensive evaluations validate the proposed distortion approximation, the derived rate-distortion bounds, and the effectiveness of the proposed joint design. Particularly, simulations and real-world testbed experiments demonstrate the effectiveness of the proposed joint design in balancing inference quality, latency, and energy consumption in edge embodied AI systems.

MLApr 26, 2023
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

In this work, we study the performance of the Thompson Sampling algorithm for Contextual Bandit problems based on the framework introduced by Neu et al. and their concept of lifted information ratio. First, we prove a comprehensive bound on the Thompson Sampling expected cumulative regret that depends on the mutual information of the environment parameters and the history. Then, we introduce new bounds on the lifted information ratio that hold for sub-Gaussian rewards, thus generalizing the results from Neu et al. which analysis requires binary rewards. Finally, we provide explicit regret bounds for the special cases of unstructured bounded contextual bandits, structured bounded contextual bandits with Laplace likelihood, structured Bernoulli bandits, and bounded linear contextual bandits.

LGJul 18, 2022
An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].

22.9LGApr 14
Instantiating Bayesian CVaR lower bounds in Interactive Decision Making Problems

Raghav Bongole, Tobias J. Oechtering, Mikael Skoglund

Recent work established a generalized-Fano framework for lower bounding prior-predictive (Bayesian) CVaR in interactive statistical decision making. In this paper, we show how to instantiate that framework in concrete interactive problems and derive explicit Bayesian CVaR lower bounds from its abstract corollaries. Our approach compares a hard model with a reference model using squared Hellinger distance, and combines a lower bound on a reference hinge term with a bound on the distinguishability of the two models. We apply this approach to canonical examples, including Gaussian bandits, and obtain explicit bounds that make the dependence on key problem parameters transparent. These results show how the generalized-Fano Bayesian CVaR framework can be used as a practical lower-bound tool for interactive learning and risk-sensitive decision making.

33.8ITMar 13
Information Density Bounds for Privacy

Sara Saeidian, Leonhard Grosse, Parastoo Sadeghi et al.

This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce a novel and operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information leakage about a secret to adversaries who aim to minimize non-negative cost functions after observing the outcome of a privacy mechanism. When restricted to finite alphabets, PMC can equivalently be defined as the information leakage to adversaries aiming to minimize the probability of incorrectly guessing randomized functions of the secret. We study the properties of PMC and apply it to standard privacy mechanisms to demonstrate its practical relevance. Through a detailed examination, we connect PMC with other privacy measures that impose upper or lower bounds on the information density. These are pointwise maximal leakage (PML), local differential privacy (LDP), and (asymmetric) local information privacy. In particular, we show that a mechanism satisfies LDP if and only if it has both bounded PMC and bounded PML. Overall, our work fills a conceptual and operational gap in the taxonomy of privacy measures, bridges existing disconnects between different frameworks, and offers insights for selecting a suitable notion of privacy in a given application.

LGJul 10, 2024
A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry

Martin Lindström, Borja Rodríguez-Gálvez, Ragnar Thobaben et al.

Hyperspherical Prototypical Learning (HPL) is a supervised approach to representation learning that designs class prototypes on the unit hypersphere. The prototypes bias the representations to class separation in a scale invariant and known geometry. Previous approaches to HPL have either of the following shortcomings: (i) they follow an unprincipled optimisation procedure; or (ii) they are theoretically sound, but are constrained to only one possible latent dimension. In this paper, we address both shortcomings. To address (i), we present a principled optimisation procedure whose solution we show is optimal. To address (ii), we construct well-separated prototypes in a wide range of dimensions using linear block codes. Additionally, we give a full characterisation of the optimal prototype placement in terms of achievable and converse bounds, showing that our proposed methods are near-optimal.

64.8DCMar 17
Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding

Chengxi Li, Youssef Allouah, Rachid Guerraoui et al.

In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity among the subsets held by different devices. To overcome this limitation, we propose a novel DT method, cyclic gradient coding-based DT (LAD). In LAD, the server allocates the entire training dataset to the devices before training begins. In each iteration, it assigns computational tasks redundantly to the devices using cyclic gradient coding. Each honest device then computes local gradients on a fixed number of data subsets and encodes the local gradients before transmitting to the server. The server aggregates the coded vectors from the honest devices and the potentially incorrect messages from Byzantine devices using a robust aggregation rule. Leveraging the redundancy of computation across devices, the convergence performance of LAD is analytically characterized, demonstrating improved robustness against Byzantine attacks and significantly lower solution error. Furthermore, we extend LAD to a communication-efficient variant, compressive and cyclic gradient coding-based DT (Com-LAD), which further reduces communication overhead under constrained settings. Numerical results validate the effectiveness of the proposed methods in enhancing both Byzantine resilience and communication efficiency.

58.7DCMar 17
Biased Compression in Gradient Coding for Distributed Learning

Chengxi Li, Ming Xiao, Mikael Skoglund

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.

MLAug 20, 2024
An Information-Theoretic Approach to Generalization Theory

Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

We investigate the in-distribution generalization of machine learning algorithms. We depart from traditional complexity-based approaches by analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We consider two categories of generalization guarantees: 1) Guarantees in expectation: These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by information measures. While these measures offer an intuitive interpretation, they overlook the geometry of the algorithm's hypothesis class. Here, we introduce bounds using the Wasserstein distance to incorporate geometry, and a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data. 2) PAC-Bayesian guarantees: These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that the former is optimized by the Gibbs posterior. We introduce novel, tighter bounds for various types of loss functions. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements. To study the limitations of these approaches, we present a counter-example where most of the information-theoretic bounds fail while traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. For discrete data, we derive new bounds for differentially private algorithms that guarantee generalization even with a constant privacy parameter, which is in contrast to previous bounds in the literature.

LGJan 12
Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission

Jingwen Fu, Ming Xiao, Mikael Skoglund et al.

Due to strict rate and reliability demands, wireless image transmission remains difficult for both classical layered designs and joint source-channel coding (JSCC), especially under low latency. Diffusion-based generative decoders can deliver strong perceptual quality by leveraging learned image priors, but iterative stochastic denoising leads to high decoding delay. To enable low-latency decoding, we propose a flow-matching (FM) generative decoder under a new land-then-transport (LTT) paradigm that tightly integrates the physical wireless channel into a continuous-time probability flow. For AWGN channels, we build a Gaussian smoothing path whose noise schedule indexes effective noise levels, and derive a closed-form teacher velocity field along this path. A neural-network student vector field is trained by conditional flow matching, yielding a deterministic, channel-aware ODE decoder with complexity linear in the number of ODE steps. At inference, it only needs an estimate of the effective noise variance to set the ODE starting time. We further show that Rayleigh fading and MIMO channels can be mapped, via linear MMSE equalization and singular-value-domain processing, to AWGN-equivalent channels with calibrated starting times. Therefore, the same probability path and trained velocity field can be reused for Rayleigh and MIMO without retraining. Experiments on MNIST, Fashion-MNIST, and DIV2K over AWGN, Rayleigh, and MIMO demonstrate consistent gains over JPEG2000+LDPC, DeepJSCC, and diffusion-based baselines, while achieving good perceptual quality with only a few ODE steps. Overall, LTT provides a deterministic, physically interpretable, and computation-efficient framework for generative wireless image decoding across diverse channels.

24.6ITMay 10
Sparse Discrete Laplace and Gaussian Mechanisms under Local Differential Privacy

Amirreza Zamani, Sajad Daei, Parastoo Sadeghi et al.

We study sparse locally private channels of the form $M(y\mid x)\propto w(x,y) 1\{y\in S(x)\},$ where the admissible output set $S(x)$ is allowed to depend on the private input $x$ and is assumed to be small. Here, we consider the sparse discrete-Laplace family with kernel $w(x,y)=e^{-λd(x,y)}$ and the sparse Gaussian family with kernel $w(x,y)=e^{-d(x,y)^2/(2σ^2)}$. For both families we give exact characterizations of pure and approximate local differential privacy. For pure $\varepsilon$-local differential privacy, we show that input-dependent sparse supports are obtained when all supports coincide. For $(\varepsilon,δ)$-local differential privacy, we derive exact formulas for the privacy defect in terms of support leakage and excess privacy loss on the overlap region. We then specialize the analysis to radius-truncated sparse discrete-Laplace and radius-truncated sparse Gaussian mechanisms and obtain explicit privacy-sparsity tradeoffs in terms of the support size $s$. In particular, we show that nontrivial approximate local privacy requires a minimum support size, whereas larger supports reduce support leakage but increase distortion. For the Gaussian family, the overlap term exhibits an additional quadratic dependence on the support radius, which implies a sharper tradeoff between privacy and sparsity. These results identify the support cardinality as the intrinsic complexity parameter of the mechanism and yield an optimal design principle: choose the smallest support size that satisfies the target privacy constraint.

26.1GTMay 8
Zero-determinant Strategy for Moving Target Defense: Existence, Performance, and Computation

Zhaoyang Cheng, Guanpu Chen, Yiguang Hong et al.

Moving Target Defense (MTD) is commonly formulated as a repeated security game to mitigate persistent threats. Although the strong Stackelberg equilibrium (SSE) characterizes the defender's optimal strategy in the leader-follower framework, computing the SSE often incurs high computational complexity, which significantly limits its practical deployment in MTD problems with multiple targets. This paper proposes adopting a zero-determinant (ZD) strategy for constructing an MTD strategy that achieves both high defensive performance and substantially low computational complexity. We first derive a necessary and sufficient condition for the existence of ZD strategies and investigate the performance of ZD strategies, which shows their upper-bound performance matches that of the SSE strategy. We then formulate two programs to find the optimal ZD strategy parameters under different conditions. Moreover, we design an algorithm to compute the proposed ZD strategies, along with the computational complexity analysis in comparison with the traditional SSE computation. Finally, we conduct experiments on two practical applications to verify our results.

38.4LGMay 5
A Hierarchical Sampling Framework for bounding the Generalization Error of Federated Learning

Dario Filatrella, Ragnar Thobaben, Mikael Skoglund

We study expected generalization bounds for the Hierarchical Federated Learning (HFL) setup using Wasserstein distance. We introduce a generalized framework in which data is sampled hierarchically, and we model it with a multi-layered tree structure that induces dependencies among the clients' datasets. We derive generalization bounds in terms of Wasserstein distance under the Lipschitz assumption on the loss function, by applying a supersample construction that allows us to measure the sensitivity of the algorithm to the change of a single node in the sampling tree. By leveraging the FL structure, we recover and strictly imply existing state-of-the-art conditional mutual information (CMI) bounds in the case of bounded losses. We also show that our bound can be applied together with Differential Privacy assumptions, to recover generalization bounds based on algorithmic privacy. To assess the tightness of our bounds, we study the Gaussian Location Model (GLM) and show that we recover the actual asymptotic rate of the generalization error.

94.0SYMay 4
SkillCom: Decomposing LLM-based Semantic Communication into Task and Channel Aware Skills

Jingwen Fu, Ming Xiao, Mikael Skoglund

Large language models (LLMs) are increasingly used as semantic encoders and decoders in semantic communication. However, current LLM based systems mostly remain monolithic: a single prompted model, or a tightly coupled transmitter/receiver pair, must jointly perform semantic encoding, channel adaptation, and semantic decoding. Such coupling makes intermediate decisions difficult to control, diagnose, or replace, and may cause channel corruption to propagate through a compressed source representation. To address the limitations, we propose \textbf{SkillCom}, a modular framework that decomposes LLM-based semantic communication into four explicit skills: semantic abstraction skill, channel-adaptive transmission skill, receiver-side repair skill, and task execution skill. These skills are interconnected through typed semantic-unit interfaces. Thus, transmission operates on structured unit-level representations rather than on one monolithic text block. This design localizes channel impairment, enables targeted repair from successfully received units, and supports stage-wise ablation and single-skill replacement under matched communication constraints. Experiments on multi-hop question answering and dialogue state tracking show that SkillCom consistently outperforms the monolithic LLM baseline, remains more robust under varying channel conditions, and exhibits task-dependent preferences over skill realizations. The results suggest that explicit skill decomposition provides a more robust and diagnosable foundation for LLM-based semantic communication than monolithic methods.

37.7ITMay 3
Channel-coded Over-the-Air Computation

Shudi Weng, Ming Xiao, Mikael Skoglund

This letter studies channel coding for over-the-air computation (AirComp). AirComp enables efficient wireless data aggregation, where computation accuracy is the key performance metric. However, this accuracy is sensitive to channel impairments. As a promising solution, the role of channel coding in AirComp has been largely unexplored, creating a critical gap in achieving reliable AirComp systems. To address this, we propose a novel channel coding scheme tailored for AirComp that preserves the aggregation structure while mitigating channel distortions. We show that the computation error decreases with the coding rate and can asymptotically approach zero. Both theoretical and simulation results demonstrate that the proposed scheme significantly enhances computation performance.

67.0ITApr 30
Perfectly Private Over-the-Air Computation

Shudi Weng, Ming Xiao, Mikael Skoglund

This paper studies a key research question: how to achieve perfect privacy in over-the-air computation (AirComp)? The problem is particularly intriguing due to a dilemma. Real-field operations can ensure invertibility but generally introduce statistical dependence, resulting in inevitable privacy leakage. In contrast, modulo operations can decorrelate the output from the original message, but suffer from the ill-posed invertibility when applied over non-prime groups (e.g., the real field). This raises a subtle yet fundamental question: Does perfect privacy intrinsically conflict with AirComp? We show that the answer is no. By carefully leveraging the interplay between real-field and modulo operations, perfect privacy and accurate computation can, in fact, be achieved simultaneously, enabling perfectly private aggregation.

SPMar 22, 2024
Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation

Chengxi Li, Ming Xiao, Mikael Skoglund

In this article, we address the problem of federated learning in the presence of stragglers. For this problem, a coded federated learning framework has been proposed, where the central server aggregates gradients received from the non-stragglers and gradient computed from a privacy-preservation global coded dataset to mitigate the negative impact of the stragglers. However, when aggregating these gradients, fixed weights are consistently applied across iterations, neglecting the generation process of the global coded dataset and the dynamic nature of the trained model over iterations. This oversight may result in diminished learning performance. To overcome this drawback, we propose a new method named adaptive coded federated learning (ACFL). In ACFL, before the training, each device uploads a coded local dataset with additive noise to the central server to generate a global coded dataset under privacy preservation requirements. During each iteration of the training, the central server aggregates the gradients received from the non-stragglers and the gradient computed from the global coded dataset, where an adaptive policy for varying the aggregation weights is designed. Under this policy, we optimize the performance in terms of privacy and learning, where the learning performance is analyzed through convergence analysis and the privacy performance is characterized via mutual information differential privacy. Finally, we perform simulations to demonstrate the superiority of ACFL compared with the non-adaptive methods.

LGMay 14, 2025
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks

Zhonghao Lyu, Ming Xiao, Jie Xu et al.

The growing demand for large artificial intelligence model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications. In particular, edge-device co-inference, which partitions LAIMs between edge devices and servers, has emerged as a promising strategy for resource-efficient LAIM execution in wireless networks. In this paper, we investigate a pruning-aware LAIM co-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment. For analysis, we first prove that the LAIM output distortion is upper bounded by its parameter distortion. Then, we derive a lower bound on parameter distortion via rate-distortion theory, analytically capturing the relationship between pruning ratio and co-inference performance. Next, based on the analytical results, we formulate an LAIM co-inference distortion bound minimization problem by jointly optimizing the pruning ratio, transmit power, and computation frequency under system latency, energy, and available resource constraints. Moreover, we propose an efficient algorithm to tackle the considered highly non-convex problem. Finally, extensive simulations demonstrate the effectiveness of the proposed design. In particular, model parameter distortion is shown to provide a reliable bound on output distortion. Also, the proposed joint pruning ratio and resource management design achieves superior performance in balancing trade-offs among inference performance, system latency, and energy consumption compared with benchmark schemes, such as fully on-device and on-server inference. Moreover, the split point is shown to play a critical role in system performance optimization under heterogeneous and resource-limited edge environments.

LGFeb 6, 2024
Gradient Coding in Decentralized Learning for Evading Stragglers

Chengxi Li, Mikael Skoglund

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.

LGOct 21, 2024
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality

Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez et al.

We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenarios.

MLFeb 4, 2025
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

Amaury Gouverneur, Borja Rodriguez Gálvez, Tobias Oechtering et al.

This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We address this by extending the analysis to settings with infinite and continuous action spaces. Additionally, we specialize our results to bandit problems with expected rewards that are Lipschitz continuous with respect to the action space, deriving a regret bound that explicitly accounts for the complexity of the action space.

MLDec 3, 2024
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(β\langle a, θ\rangle)/(1+\exp(β\langle a, θ\rangle))$, with slope parameter $β>0$, and where both the action $a\in \mathcal{A}$ and parameter $θ\in \mathcal{O}$ lie within the $d$-dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by $\tfrac{9}{2}dα^{-2}$, where $α$ is a minimax measure of the alignment between the action space $\mathcal{A}$ and the parameter space $\mathcal{O}$, and is independent of $β$. Using this result, we derive a bound of order $O(d/α\sqrt{T \log(βT/d)})$ on the Bayesian expected regret of Thompson Sampling incurred after $T$ time steps. To our knowledge, this is the first regret bound for logistic bandits that depends only logarithmically on $β$ while being independent of the number of actions. In particular, when the action space contains the parameter space, the bound on the expected regret is of order $\tilde{O}(d \sqrt{T})$.

LGJan 25
Coding-Enforced Resilient and Secure Aggregation for Hierarchical Federated Learning

Shudi Weng, Ming Xiao, Mikael Skoglund

Hierarchical federated learning (HFL) has emerged as an effective paradigm to enhance link quality between clients and the server. However, ensuring model accuracy while preserving privacy under unreliable communication remains a key challenge in HFL, as the coordination among privacy noise can be randomly disrupted. To address this limitation, we propose a robust hierarchical secure aggregation scheme, termed H-SecCoGC, which integrates coding strategies to enforce structured aggregation. The proposed scheme not only ensures accurate global model construction under varying levels of privacy, but also avoids the partial participation issue, thereby significantly improving robustness, privacy preservation, and learning efficiency. Both theoretical analyses and experimental results demonstrate the superiority of our scheme under unreliable communication across arbitrarily strong privacy guarantees

ITOct 7, 2025
Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making

Raghav Bongole, Amirreza Zamani, Tobias J. Oechtering et al.

Minimax risk and regret focus on expectation, missing rare failures critical in safety-critical bandits and reinforcement learning. Minimax quantiles capture these tails. Three strands of prior work motivate this study: minimax-quantile bounds restricted to non-interactive estimation; unified interactive analyses that focus on expected risk rather than risk level specific quantile bounds; and high-probability bandit bounds that still lack a quantile-specific toolkit for general interactive protocols. To close this gap, within the interactive statistical decision making framework, we develop high-probability Fano and Le Cam tools and derive risk level explicit minimax-quantile bounds, including a quantile-to-expectation conversion and a tight link between strict and lower minimax quantiles. Instantiating these results for the two-armed Gaussian bandit immediately recovers optimal-rate bounds.

LGMay 17, 2025
Coded Robust Aggregation for Distributed Learning under Byzantine Attacks

Chengxi Li, Ming Xiao, Mikael Skoglund

In this paper, we investigate the problem of distributed learning (DL) in the presence of Byzantine attacks. For this problem, various robust bounded aggregation (RBA) rules have been proposed at the central server to mitigate the impact of Byzantine attacks. However, current DL methods apply RBA rules for the local gradients from the honest devices and the disruptive information from Byzantine devices, and the learning performance degrades significantly when the local gradients of different devices vary considerably from each other. To overcome this limitation, we propose a new DL method to cope with Byzantine attacks based on coded robust aggregation (CRA-DL). Before training begins, the training data are allocated to the devices redundantly. During training, in each iteration, the honest devices transmit coded gradients to the server computed from the allocated training data, and the server then aggregates the information received from both honest and Byzantine devices using RBA rules. In this way, the global gradient can be approximately recovered at the server to update the global model. Compared with current DL methods applying RBA rules, the improvement of CRA-DL is attributed to the fact that the coded gradients sent by the honest devices are closer to each other. This closeness enhances the robustness of the aggregation against Byzantine attacks, since Byzantine messages tend to be significantly different from those of honest devices in this case. We theoretically analyze the convergence performance of CRA-DL. Finally, we present numerical results to verify the superiority of the proposed method over existing baselines, showing its enhanced learning performance under Byzantine attacks.

MLFeb 17, 2025
Refined PAC-Bayes Bounds for Offline Bandits

Amaury Gouverneur, Tobias J. Oechtering, Mikael Skoglund

In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodríguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the "in probability" parameter. We provide two parameter-free PAC-Bayes bounds, one based on Hoeffding-Azuma's inequality and the other based on Bernstein's inequality. We prove that our bounds are almost optimal as they recover the same rate as would be obtained by setting the "in probability" parameter after the realization of the data.

ITJan 19, 2025
Reinforcement Learning Based Goodput Maximization with Quantized Feedback in URLLC

Hasan Basri Celebi, Mikael Skoglund

This paper presents a comprehensive system model for goodput maximization with quantized feedback in Ultra-Reliable Low-Latency Communication (URLLC), focusing on dynamic channel conditions and feedback schemes. The study investigates a communication system, where the receiver provides quantized channel state information to the transmitter. The system adapts its feedback scheme based on reinforcement learning, aiming to maximize goodput while accommodating varying channel statistics. We introduce a novel Rician-$K$ factor estimation technique to enable the communication system to optimize the feedback scheme. This dynamic approach increases the overall performance, making it well-suited for practical URLLC applications where channel statistics vary over time.

MLMar 25, 2024
A note on generalization bounds for losses with finite moments

Borja Rodríguez-Gálvez, Omar Rivasplata, Ragnar Thobaben et al.

This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.

LGMar 19, 2024
Distributed Learning based on 1-Bit Gradient Coding in the Presence of Stragglers

Chengxi Li, Mikael Skoglund

This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GCDL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and nonconvex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.

MLMar 5, 2024
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

LGFeb 7, 2022
Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning

Hao Chen, Yu Ye, Ming Xiao et al.

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data to a centralized location from distinct ones. This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices and the learning algorithm run on-device, with the aim of relaxing the burden at a central entity/server. Although gossip-based approaches have been used for this purpose in different use cases, they suffer from high communication costs, especially when the number of devices is large. To mitigate this, incremental-based methods are proposed. We first introduce incremental block-coordinate descent (I-BCD) for the decentralized ML, which can reduce communication costs at the expense of running time. To accelerate the convergence speed, an asynchronous parallel incremental BCD (API-BCD) method is proposed, where multiple devices/agents are active in an asynchronous fashion. We derive convergence properties for the proposed methods. Simulation results also show that our API-BCD method outperforms state of the art in terms of running time and communication costs.

LGOct 22, 2021
Federated Learning over Wireless IoT Networks with Optimized Communication and Resources

Hao Chen, Shaocheng Huang, Deyou Zhang et al.

To leverage massive distributed data and computation resources, machine learning in the network edge is considered to be a promising technique especially for large-scale model training. Federated learning (FL), as a paradigm of collaborative learning techniques, has obtained increasing research attention with the benefits of communication efficiency and improved data privacy. Due to the lossy communication channels and limited communication resources (e.g., bandwidth and power), it is of interest to investigate fast responding and accurate FL schemes over wireless systems. Hence, we investigate the problem of jointly optimized communication efficiency and resources for FL over wireless Internet of things (IoT) networks. To reduce complexity, we divide the overall optimization problem into two sub-problems, i.e., the client scheduling problem and the resource allocation problem. To reduce the communication costs for FL in wireless IoT networks, a new client scheduling policy is proposed by reusing stale local model parameters. To maximize successful information exchange over networks, a Lagrange multiplier method is first leveraged by decoupling variables including power variables, bandwidth variables and transmission indicators. Then a linear-search based power and bandwidth allocation method is developed. Given appropriate hyper-parameters, we show that the proposed communication-efficient federated learning (CEFL) framework converges at a strong linear rate. Through extensive experiments, it is revealed that the proposed CEFL framework substantially boosts both the communication efficiency and learning performance of both training loss and test accuracy for FL over wireless IoT networks compared to a basic FL approach with uniform resource allocation.

ITSep 17, 2021
Generalized Talagrand Inequality for Sinkhorn Distance using Entropy Power Inequality

Shuchan Wang, Photios A. Stavrou, Mikael Skoglund

In this paper, we study the connection between entropic optimal transport and entropy power inequality (EPI). First, we prove an HWI-type inequality making use of the infinitesimal displacement convexity of optimal transport map. Second, we derive two Talagrand-type inequalities using the saturation of EPI that corresponds to a numerical term in our expression. We evaluate for a wide variety of distributions this term whereas for Gaussian and i.i.d. Cauchy distributions this term is found in explicit form. We show that our results extend previous results of Gaussian Talagrand inequality for Sinkhorn distance to the strongly log-concave case.

LGJun 30, 2021
Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT

Wanlu Lei, Yu Ye, Ming Xiao et al.

Edge computing provides a promising paradigm to support the implementation of Industrial Internet of Things (IIoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network size makes it impractical for centralized data processing due to limited bandwidth, and consequently a decentralized learning scheme is preferable. Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes. For RL in a decentralized setup, edge nodes (agents) connected through a communication network aim to work collaboratively to find a policy to optimize the global reward as the sum of local rewards. However, communication costs, scalability and adaptation in complex environments with heterogeneous agents may significantly limit the performance of decentralized RL. Alternating direction method of multipliers (ADMM) has a structure that allows for decentralized implementation, and has shown faster convergence than gradient descent based methods. Therefore, we propose an adaptive stochastic incremental ADMM (asI-ADMM) algorithm and apply the asI-ADMM to decentralized RL with edge-computing-empowered IIoT networks. We provide convergence properties for proposed algorithms by designing a Lyapunov function and prove that the asI-ADMM has $O(\frac{1}{k}) +O(\frac{1}{M})$ convergence rate where $k$ and $ M$ are the number of iterations and batch samples, respectively. Then, we test our algorithm with two supervised learning problems. For performance evaluation, we simulate two applications in decentralized RL settings with homogeneous and heterogeneous agents. The experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.

ITMar 5, 2021
A Learning-Based Approach to Address Complexity-Reliability Tradeoff in OS Decoders

Baptiste Cavarec, Hasan Basri Celebi, Mats Bengtsson et al.

In this paper, we study the tradeoffs between complexity and reliability for decoding large linear block codes. We show that using artificial neural networks to predict the required order of an ordered statistics based decoder helps in reducing the average complexity and hence the latency of the decoder. We numerically validate the approach through Monte Carlo simulations.

MLJan 22, 2021
Tighter expected generalization error bounds via Wasserstein distance

Borja Rodríguez-Gálvez, Germán Bassi, Ragnar Thobaben et al.

This work presents several expected generalization error bounds based on the Wasserstein distance. More specifically, it introduces full-dataset, single-letter, and random-subset bounds, and their analogues in the randomized subsample setting from Steinke and Zakynthinou [1]. Moreover, when the loss function is bounded and the geometry of the space is ignored by the choice of the metric in the Wasserstein distance, these bounds recover from below (and thus, are tighter than) current bounds based on the relative entropy. In particular, they generate new, non-vacuous bounds based on the relative entropy. Therefore, these results can be seen as a bridge between works that account for the geometry of the hypothesis space and those based on the relative entropy, which is agnostic to such geometry. Furthermore, it is shown how to produce various new bounds based on different information measures (e.g., the lautum information or several $f$-divergences) based on these bounds and how to derive similar bounds with respect to the backward channel using the presented proof techniques.

LGOct 22, 2020
A ReLU Dense Layer to Improve the Performance of Neural Networks

Alireza M. Javid, Sandipan Das, Mikael Skoglund et al.

We propose ReDense as a simple and low complexity way to improve the performance of trained neural networks. We use a combination of random weights and rectified linear unit (ReLU) activation function to add a ReLU dense (ReDense) layer to the trained neural network such that it can achieve a lower training loss. The lossless flow property (LFP) of ReLU is the key to achieve the lower training loss while keeping the generalization error small. ReDense does not suffer from vanishing gradient problem in the training due to having a shallow structure. We experimentally show that ReDense can improve the training and testing performance of various neural network architectures with different optimization loss and activation functions. Finally, we test ReDense on some of the state-of-the-art architectures and show the performance improvement on benchmark datasets.

ITOct 21, 2020
On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm

Borja Rodríguez-Gálvez, Germán Bassi, Ragnar Thobaben et al.

In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellström and Durisi [1]. First, we recover the bounds based on the individual sample mutual information from Bu et al. [2] and on a random subset of the dataset from Negrea et al. [3]. Then, we introduce their new, analogous bounds in the randomized subsample setting from Steinke and Zakynthinou [4], and we identify some limitations of the framework. Finally, we extend the bounds from Haghifam et al. [5] for Langevin dynamics to stochastic gradient Langevin dynamics and we refine them for loss functions with potentially large gradient norms.

DCOct 2, 2020
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing

Hao Chen, Yu Ye, Ming Xiao et al.

Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. To train large-scale machine learning models, edge/fog computing is often leveraged as an alternative to centralized learning. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch stochastic alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model. To address two main critical challenges in distributed networks, i.e., communication bottleneck and straggler nodes (nodes with slow responses), error-control-coding based stochastic incremental ADMM is investigated. Given an appropriate mini-batch size, we show that the mini-batch stochastic ADMM based method converges in a rate of $O(\frac{1}{\sqrt{k}})$, where $k$ denotes the number of iterations. Through numerical experiments, it is revealed that the proposed algorithm is communication-efficient, rapidly responding and robust in the presence of straggler nodes compared with state of the art algorithms.

LGSep 29, 2020
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning

Xinyue Liang, Alireza M. Javid, Mikael Skoglund et al.

We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers). We assume the communication network between the workers is synchronized and can be modeled as a doubly-stochastic mixing matrix without having any master node. In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. Using alternating-direction-method-of-multipliers (ADMM) along with a layerwise convex optimization approach, we propose a decentralized learning algorithm which enjoys low computational complexity and communication cost among the workers. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place. Finally, we experimentally illustrate the time complexity and convergence behavior of the algorithm.

ITJun 12, 2020
Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling

Sina Molavipour, Germán Bassi, Mikael Skoglund

The estimation of mutual information (MI) or conditional mutual information (CMI) from a set of samples is a long-standing problem. A recent line of work in this area has leveraged the approximation power of artificial neural networks and has shown improvements over conventional methods. One important challenge in this new approach is the need to obtain, given the original dataset, a different set where the samples are distributed according to a specific product density function. This is particularly challenging when estimating CMI. In this paper, we introduce a new technique, based on k nearest neighbors (k-NN), to perform the resampling and derive high-confidence concentration bounds for the sample average. Then the technique is employed to train a neural network classifier and the CMI is estimated accordingly. We propose three estimators using this technique and prove their consistency, make a comparison between them and similar approaches in the literature, and experimentally show improvements in estimating the CMI in terms of accuracy and variance of the estimators.

MLJun 11, 2020
A Variational Approach to Privacy and Fairness

Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

In this article, we propose a new variational approach to learn private and/or fair representations. This approach is based on the Lagrangians of a new formulation of the privacy and fairness optimization problems that we propose. In this formulation, we aim to generate representations of the data that keep a prescribed level of the relevant information that is not shared by the private or sensitive data, while minimizing the remaining information they keep. The proposed approach (i) exhibits the similarities of the privacy and fairness problems, (ii) allows us to control the trade-off between utility and privacy or fairness through the Lagrange multiplier parameter, and (iii) can be comfortably incorporated to common representation learning algorithms such as the VAE, the $β$-VAE, the VIB, or the nonlinear IB.

ITMay 12, 2020
Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data

Borja Rodríguez-Gálvez, Germán Bassi, Mikael Skoglund

In this work, we study the generalization capability of algorithms from an information-theoretic perspective. It has been shown that the expected generalization error of an algorithm is bounded from above by a function of the relative entropy between the conditional probability distribution of the algorithm's output hypothesis, given the dataset with which it was trained, and its marginal probability distribution. We build upon this fact and introduce a mathematical formulation to obtain upper bounds on this relative entropy. Assuming that the data is discrete, we then develop a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of stable algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets. In particular, we show the bounds obtained with this strategy for the case of $ε$-DP and $μ$-GDP algorithms.

SPApr 10, 2020
Asynchronous Decentralized Learning of a Neural Network

Xinyue Liang, Alireza M. Javid, Mikael Skoglund et al.

In this work, we exploit an asynchronous computing framework namely ARock to learn a deep neural network called self-size estimating feedforward neural network (SSFN) in a decentralized scenario. Using this algorithm namely asynchronous decentralized SSFN (dSSFN), we provide the centralized equivalent solution under certain technical assumptions. Asynchronous dSSFN relaxes the communication bottleneck by allowing one node activation and one side communication, which reduces the communication overhead significantly, consequently increasing the learning speed. We compare asynchronous dSSFN with traditional synchronous dSSFN in the experimental results, which shows the competitive performance of asynchronous dSSFN, especially when the communication network is sparse.