Pranay Sharma

LG
h-index56
21papers
652citations
Novelty56%
AI Score46

21 Papers

LGApr 11, 2023Code
Model Sparsity Can Simplify Machine Unlearning

Jinghan Jia, Jiancheng Liu, Parikshit Ram et al.

In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.

LGOct 28, 2023
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Aleksandar Armacki, Pranay Sharma, Gauri Joshi et al. · cmu

We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results. First, for non-convex costs and component-wise nonlinearities, we establish a convergence rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{4}}\right)$, whose exponent is independent of noise and problem parameters. Second, for strongly convex costs and component-wise nonlinearities, we establish a rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{2}}\right)$ for the weighted average of iterates, with exponent again independent of noise and problem parameters. Finally, for strongly convex costs and a broader class of nonlinearities, we establish convergence of the last iterate, with a rate $\mathcal{O}\left(t^{-ζ} \right)$, where $ζ\in (0,1)$ depends on problem parameters, noise and nonlinearity. As we show analytically and numerically, $ζ$ can be used to inform the preferred choice of nonlinearity for given problem settings. Compared to state-of-the-art, who only consider clipping, require bounded noise moments of order $η\in (1,2]$, and establish convergence rates whose exponents go to zero as $η\rightarrow 1$, we provide high-probability guarantees for a much broader class of nonlinearities and symmetric density noise, with convergence rates whose exponents are bounded away from zero, even when the noise has finite first moment only. Moreover, in the case of strongly convex functions, we demonstrate analytically and numerically that clipping is not always the optimal nonlinearity, further underlining the value of our general framework.

OCMar 9, 2022
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

Pranay Sharma, Rohan Panda, Gauri Joshi et al.

In this paper, we consider nonconvex minimax optimization, which is gaining prominence in many modern machine learning applications such as GANs. Large-scale edge-based collection of training data in these applications calls for communication-efficient distributed optimization algorithms, such as those used in federated learning, to process the data. In this paper, we analyze Local stochastic gradient descent ascent (SGDA), the local-update version of the SGDA algorithm. SGDA is the core algorithm used in minimax optimization, but it is not well-understood in a distributed setting. We prove that Local SGDA has \textit{order-optimal} sample complexity for several classes of nonconvex-concave and nonconvex-nonconcave minimax problems, and also enjoys \textit{linear speedup} with respect to the number of clients. We provide a novel and tighter analysis, which improves the convergence and communication guarantees in the existing literature. For nonconvex-PL and nonconvex-one-point-concave functions, we improve the existing complexity results for centralized minimax problems. Furthermore, we propose a momentum-based local-update algorithm, which has the same convergence guarantees, but outperforms Local SGDA as demonstrated in our experiments.

LGFeb 6, 2023
On the Convergence of Federated Averaging with Cyclic Client Participation

Yae Jee Cho, Pranay Sharma, Gauri Joshi et al.

Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a natural cyclic pattern. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols.

LGJul 28, 2022
FedVARP: Tackling the Variance Due to Partial Client Participation in Federated Learning

Divyansh Jhunjhunwala, Pranay Sharma, Aushim Nagarkatti et al.

Data-heterogeneous federated learning (FL) systems suffer from two significant sources of convergence error: 1) client drift error caused by performing multiple local optimization steps at clients, and 2) partial client participation error caused by the fact that only a small subset of the edge clients participate in every training round. We find that among these, only the former has received significant attention in the literature. To remedy this, we propose FedVARP, a novel variance reduction algorithm applied at the server that eliminates error due to partial client participation. To do so, the server simply maintains in memory the most recent update for each client and uses these as surrogate updates for the non-participating clients in every round. Further, to alleviate the memory requirement at the server, we propose a novel clustering-based variance reduction algorithm ClusterFedVARP. Unlike previously proposed methods, both FedVARP and ClusterFedVARP do not require additional computation at clients or communication of additional optimization parameters. Through extensive experiments, we show that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVARP with much less memory requirements.

LGMar 4, 2023
What Is Missing in IRM Training and Evaluation? Challenges and Solutions

Yihua Zhang, Pranay Sharma, Parikshit Ram et al.

Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions, and as a principled solution for preventing spurious correlations from being learned and for improving models' out-of-distribution generalization. Yet, recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice or could be impossible to achieve in some scenarios. Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRM. In this work, we revisit these recent IRM advancements, and identify and resolve three practical limitations in IRM training and evaluation. First, we find that the effect of batch size during training has been chronically overlooked in previous studies, leaving room for further improvement. We propose small-batch training and highlight the improvements over a set of large-batch optimization techniques. Second, we find that improper selection of evaluation environments could give a false sense of invariance for IRM. To alleviate this effect, we leverage diversified test-time environments to precisely characterize the invariance of IRM when applied in practice. Third, we revisit (Ahuja et al. (2020))'s proposal to convert IRM into an ensemble game and identify a limitation when a single invariant predictor is desired instead of an ensemble of individual predictors. We propose a new IRM variant to address this limitation based on a novel viewpoint of ensemble IRM games as consensus-constrained bi-level optimization. Lastly, we conduct extensive experiments (covering 7 existing IRM variants and 7 datasets) to justify the practical significance of revisiting IRM training and evaluation in a principled manner.

LGJun 21, 2022
Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning

Sajad Khodadadian, Pranay Sharma, Gauri Joshi et al.

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of communication cost, and it can also compromise the privacy of each agent's local behavior policy. Federated reinforcement learning is a framework in which $N$ agents collaboratively learn a global model, without sharing their individual data and policies. This global model is the unique fixed point of the average of $N$ local operators, corresponding to the $N$ agents. Each agent maintains a local copy of the global model and updates it using locally sampled data. In this paper, we show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model $N$ times faster, also known as linear speedup. We first propose a general framework for federated stochastic approximation with Markovian noise and heterogeneity, showing linear speedup in convergence. We then apply this framework to federated reinforcement learning algorithms, examining the convergence of federated on-policy TD, off-policy TD, and $Q$-learning.

LGFeb 8, 2023
Federated Minimax Optimization with Client Heterogeneity

Pranay Sharma, Rohan Panda, Gauri Joshi

Minimax optimization has seen a surge in interest with the advent of modern applications such as GANs, and it is inherently more challenging than simple minimization. The difficulty is exacerbated by the training data residing at multiple edge devices or \textit{clients}, especially when these clients can have heterogeneous datasets and local computation capabilities. We propose a general federated minimax optimization framework that subsumes such settings and several existing methods like Local SGDA. We show that naive aggregation of heterogeneous local progress results in optimizing a mismatched objective function -- a phenomenon previously observed in standard federated minimization. To fix this problem, we propose normalizing the client updates by the number of local steps undertaken between successive communication rounds. We analyze the convergence of the proposed algorithm for classes of nonconvex-concave and nonconvex-nonconcave functions and characterize the impact of heterogeneous client data, partial client participation, and heterogeneous local computations. Our analysis works under more general assumptions on the intra-client noise and inter-client heterogeneity than so far considered in the literature. For all the function classes considered, we significantly improve the existing computation and communication complexity results. Experimental results support our theoretical claims.

LGJun 2, 2023
Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma et al.

Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods). However, designing provably-efficient federated algorithms for MSA has been an elusive question even for the special case of double sequence approximation (DSA). Towards this goal, we develop FedMSA which is the first federated algorithm for MSA, and establish its near-optimal communication complexity. As core novelties, (i) FedMSA enables the provable estimation of hypergradients in BLO and MCO via local client updates, which has been a notable bottleneck in prior theory, and (ii) our convergence guarantees are sensitive to the heterogeneity-level of the problem. We also incorporate momentum and variance reduction techniques to achieve further acceleration leading to near-optimal rates. Finally, we provide experiments that support our theory and demonstrate the empirical benefits of FedMSA. As an example, FedMSA enables order-of-magnitude savings in communication rounds compared to prior federated BLO schemes.

DCOct 29, 2023
Correlation Aware Sparsified Mean Estimation Using Random Projection

Shuli Jiang, Pranay Sharma, Gauri Joshi

We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-$k$ sparsification is a commonly used technique to reduce communication cost, where each client sends $k < d$ of its coordinates to the server. However, Rand-$k$ is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-$k$-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-$k$'s performance. Yet, the performance of Rand-$k$-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-$k$ by projecting the client vectors to a random $k$-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-$k$-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-$k$-Spatial and other more sophisticated sparsification techniques.

LGApr 21
FedSEA: Achieving Benefit of Parallelization in Federated Online Learning

Harekrushna Sahu, Pratik Jawanpuria, Pranay Sharma

Online federated learning (OFL) has emerged as a popular framework for decentralized decision-making over continuous data streams without compromising client privacy. However, the adversary model assumed in standard OFL typically precludes any potential benefits of parallelization. Further, it fails to adequately capture the different sources of statistical variation in OFL problems. In this paper, we extend the OFL paradigm by integrating a stochastically extended adversary (SEA). Under this framework, the loss function remains fixed across clients over time. However, the adversary dynamically and independently selects the data distribution for each client at each time. We propose the \algoOFL{} algorithm to solve this problem, which utilizes online stochastic gradient descent at the clients, along with periodic global aggregation via the server. We establish bounds on the global network regret over a time horizon \(T\) for two classes of functions: (1) for smooth and convex losses, we prove an \(\mathcal{O}(\sqrt{T})\) bound, and (2) for smooth and strongly convex losses, we prove an \(\mathcal{O}(\log T)\) bound. Through careful analysis, we quantify the individual impact of both spatial (across clients) and temporal (over time) data heterogeneity on the regret bounds. Consequently, we identify a regime of mild temporal variation (relative to stochastic gradient variance), where the network regret improves with parallelization. Hence, in the SEA setting, our results improve the existing pessimistic worst-case results in online federated learning.

LGOct 17, 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

Aleksandar Armacki, Shuhua Yu, Pranay Sharma et al.

We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a broad range of nonlinear methods. For symmetric noise and non-convex costs we establish convergence of gradient norm-squared, at a rate $\widetilde{\mathcal{O}}(t^{-1/4})$, while for the last iterate of strongly convex costs we establish convergence to the population optima, at a rate $\mathcal{O}(t^{-ζ})$, where $ζ\in (0,1)$ depends on noise and problem parameters. Further, if the noise is a (biased) mixture of symmetric and non-symmetric components, we show convergence to a neighbourhood of stationarity, whose size depends on the mixture coefficient, nonlinearity and noise. Compared to state-of-the-art, who only consider clipping and require unbiased noise with bounded $p$-th moments, $p \in (1,2]$, we provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments. While the rate exponents in state-of-the-art depend on noise moments and vanish as $p \rightarrow 1$, our exponents are constant and strictly better whenever $p < 6/5$ for non-convex and $p < 8/7$ for strongly convex costs. Experiments validate our theory, showing that clipping is not always the optimal nonlinearity, further underlining the value of a general framework.

LGApr 23, 2025
Natural Policy Gradient for Average Reward Non-Stationary RL

Neharika Jali, Eshika Pathak, Pranay Sharma et al.

We consider the problem of non-stationary reinforcement learning (RL) in the infinite-horizon average-reward setting. We model it by a Markov Decision Process with time-varying rewards and transition probabilities, with a variation budget of $Δ_T$. Existing non-stationary RL algorithms focus on model-based and model-free value-based methods. Policy-based methods despite their flexibility in practice are not theoretically well understood in non-stationary RL. We propose and analyze the first model-free policy-based algorithm, Non-Stationary Natural Actor-Critic (NS-NAC), a policy gradient method with a restart based exploration for change and a novel interpretation of learning rates as adapting factors. Further, we present a bandit-over-RL based parameter-free algorithm BORL-NS-NAC that does not require prior knowledge of the variation budget $Δ_T$. We present a dynamic regret of $\tilde{\mathscr O}(|S|^{1/2}|A|^{1/2}Δ_T^{1/6}T^{5/6})$ for both algorithms, where $T$ is the time horizon, and $|S|$, $|A|$ are the sizes of the state and action spaces. The regret analysis leverages a novel adaptation of the Lyapunov function analysis of NAC to dynamic environments and characterizes the effects of simultaneous updates in policy, value function estimate and changes in the environment.

LGFeb 11, 2025
Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning

Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu et al.

Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a federated learning (FL) setting, where the downstream training is performed at the edge clients with heterogeneous data distribution. These works show that starting from a pre-trained model can substantially reduce the adverse impact of data heterogeneity on the test performance of a model trained in a federated setting, with no changes to the standard FedAvg training algorithm. In this work, we provide a deeper theoretical understanding of this phenomenon. To do so, we study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg. We introduce the notion of aligned and misaligned filters at initialization and show that the data heterogeneity only affects learning on misaligned filters. Starting with a pre-trained model typically results in fewer misaligned filters at initialization, thus producing a lower test error even when the model is trained in a federated setting with data heterogeneity. Experiments in synthetic settings and practical FL training on CNNs verify our theoretical findings.

LGOct 21, 2024
Federated Communication-Efficient Multi-Objective Optimization

Baris Askin, Pranay Sharma, Gauri Joshi et al.

We study a federated version of multi-objective optimization (MOO), where a single model is trained to optimize multiple objective functions. MOO has been extensively studied in the centralized setting but is less explored in federated or distributed settings. We propose FedCMOO, a novel communication-efficient federated multi-objective optimization (FMOO) algorithm that improves the error convergence performance of the model compared to existing approaches. Unlike prior works, the communication cost of FedCMOO does not scale with the number of objectives, as each client sends a single aggregated gradient to the central server. We provide a convergence analysis of the proposed method for smooth and non-convex objective functions under milder assumptions than in prior work. In addition, we introduce a variant of FedCMOO that allows users to specify a preference over the objectives in terms of a desired ratio of the final objective values. Through extensive experiments, we demonstrate the superiority of our proposed method over baseline approaches.

LGFeb 5, 2025
The Cost of Shuffling in Private Gradient Based Optimization

Shuli Jiang, Pranay Sharma, Zhiwei Steven Wu et al.

We consider the problem of differentially private (DP) convex empirical risk minimization (ERM). While the standard DP-SGD algorithm is theoretically well-established, practical implementations often rely on shuffled gradient methods that traverse the training data sequentially rather than sampling with replacement in each iteration. Despite their widespread use, the theoretical privacy-accuracy trade-offs of private shuffled gradient methods (\textit{DP-ShuffleG}) remain poorly understood, leading to a gap between theory and practice. In this work, we leverage privacy amplification by iteration (PABI) and a novel application of Stein's lemma to provide the first empirical excess risk bound of \textit{DP-ShuffleG}. Our result shows that data shuffling results in worse empirical excess risk for \textit{DP-ShuffleG} compared to DP-SGD. To address this limitation, we propose \textit{Interleaved-ShuffleG}, a hybrid approach that integrates public data samples in private optimization. By alternating optimization steps that use private and public samples, \textit{Interleaved-ShuffleG} effectively reduces empirical excess risk. Our analysis introduces a new optimization framework with surrogate objectives, adaptive noise injection, and a dissimilarity metric, which can be of independent interest. Our experiments on diverse datasets and tasks demonstrate the superiority of \textit{Interleaved-ShuffleG} over several baselines.

LGJun 1, 2024
FedAST: Federated Asynchronous Simultaneous Training

Baris Askin, Pranay Sharma, Carlee Joe-Wong et al.

Federated Learning (FL) enables edge devices or clients to collaboratively train machine learning (ML) models without sharing their private data. Much of the existing work in FL focuses on efficiently learning a model for a single task. In this paper, we study simultaneous training of multiple FL models using a common set of clients. The few existing simultaneous training methods employ synchronous aggregation of client updates, which can cause significant delays because large models and/or slow clients can bottleneck the aggregation. On the other hand, a naive asynchronous aggregation is adversely affected by stale client updates. We propose FedAST, a buffered asynchronous federated simultaneous training algorithm that overcomes bottlenecks from slow models and adaptively allocates client resources across heterogeneous tasks. We provide theoretical convergence guarantees for FedAST for smooth non-convex objective functions. Extensive experiments over multiple real-world datasets demonstrate that our proposed method outperforms existing simultaneous FL approaches, achieving up to 46.0% reduction in time to train multiple tasks to completion.

LGJun 19, 2021
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Prashant Khanduri, Pranay Sharma, Haibo Yang et al.

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(ε^{-3/2})$ samples and $\tilde{\mathcal{O}}(ε^{-1})$ communication rounds to compute an $ε$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.

MLDec 21, 2020
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework

Pranay Sharma, Kaidi Xu, Sijia Liu et al.

In this work, we focus on the study of stochastic zeroth-order (ZO) optimization which does not require first-order gradient information and uses only function evaluations. The problem of ZO optimization has emerged in many recent machine learning applications, where the gradient of the objective function is either unavailable or difficult to compute. In such cases, we can approximate the full gradients or stochastic gradients through function value based gradient estimates. Here, we propose a novel hybrid gradient estimator (HGE), which takes advantage of the query-efficiency of random gradient estimates as well as the variance-reduction of coordinate-wise gradient estimates. We show that with a graceful design in coordinate importance sampling, the proposed HGE-based ZO optimization method is efficient both in terms of iteration complexity as well as function query cost. We provide a thorough theoretical analysis of the convergence of our proposed method for non-convex, convex, and strongly-convex optimization. We show that the convergence rate that we derive generalizes the results for some prominent existing methods in the nonconvex case, and matches the optimal result in the convex case. We also corroborate the theory with a real-world black-box attack generation application to demonstrate the empirical advantage of our method over state-of-the-art ZO optimization approaches.

OCDec 12, 2019
Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

Pranay Sharma, Swatantra Kafle, Prashant Khanduri et al.

In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where $N$ nodes, each having $n$ (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The proposed approach is a non-trivial extension of the popular parallel-restarted SGD algorithm, incorporating the optimal variance-reduction based SPIDER gradient estimator into it. We prove convergence of our algorithm to a first-order stationary solution. The proposed approach achieves the best known communication complexity $O(ε^{-1})$ along with the optimal computation complexity. For finite-sum problems (finite $n$), we achieve the optimal computation (IFO) complexity $O(\sqrt{Nn}ε^{-1})$. For online problems ($n$ unknown or infinite), we achieve the optimal IFO complexity $O(ε^{-3/2})$. In both the cases, we maintain the linear speedup achieved by existing methods. This is a massive improvement over the $O(ε^{-2})$ IFO complexity of the existing approaches. Additionally, our algorithm is general enough to allow non-identical distributions of data across workers, as in the recently proposed federated learning paradigm.

MLJun 25, 2018
Why Interpretability in Machine Learning? An Answer Using Distributed Detection and Data Fusion Theory

Kush R. Varshney, Prashant Khanduri, Pranay Sharma et al.

As artificial intelligence is increasingly affecting all parts of society and life, there is growing recognition that human interpretability of machine learning models is important. It is often argued that accuracy or other similar generalization performance metrics must be sacrificed in order to gain interpretability. Such arguments, however, fail to acknowledge that the overall decision-making system is composed of two entities: the learned model and a human who fuses together model outputs with his or her own information. As such, the relevant performance criteria should be for the entire system, not just for the machine learning component. In this work, we characterize the performance of such two-node tandem data fusion systems using the theory of distributed detection. In doing so, we work in the population setting and model interpretable learned models as multi-level quantizers. We prove that under our abstraction, the overall system of a human with an interpretable classifier outperforms one with a black box classifier.