LGMay 23, 2022
Semi-Decentralized Federated Learning with Collaborative RelayingMichal Yemini, Rajarshi Saha, Emre Ozfatura et al.
We present a semi-decentralized federated learning algorithm wherein clients collaborate by relaying their neighbors' local updates to a central parameter server (PS). At every communication round to the PS, each client computes a local consensus of the updates from its neighboring clients and eventually transmits a weighted average of its own update and those of its neighbors to the PS. We appropriately optimize these averaging weights to ensure that the global update at the PS is unbiased and to reduce the variance of the global update at the PS, consequently improving the rate of convergence. Numerical simulations substantiate our theoretical claims and demonstrate settings with intermittent connectivity between the clients and the PS, where our proposed algorithm shows an improved convergence rate and accuracy in comparison with the federated averaging algorithm.
ITJun 19, 2022
All you need is feedback: Communication with block attention feedback codesEmre Ozfatura, Yulin Shao, Alberto Perotti et al.
Deep learning based channel code designs have recently gained interest as an alternative to conventional coding algorithms, particularly for channels for which existing codes do not provide effective solutions. Communication over a feedback channel is one such problem, for which promising results have recently been obtained by employing various deep learning architectures. In this paper, we introduce a novel learning-aided code design for feedback channels, called generalized block attention feedback (GBAF) codes, which i) employs a modular architecture that can be implemented using different neural network architectures; ii) provides order-of-magnitude improvements in the probability of error compared to existing designs; and iii) can transmit at desired code rates.
ITMay 30, 2022
AttentionCode: Ultra-Reliable Feedback Codes for Short-Packet CommunicationsYulin Shao, Emre Ozfatura, Alberto Perotti et al.
Ultra-reliable short-packet communication is a major challenge in future wireless networks with critical applications. To achieve ultra-reliable communications beyond 99.999%, this paper envisions a new interaction-based communication paradigm that exploits feedback from the receiver. We present AttentionCode, a new class of feedback codes leveraging deep learning (DL) technologies. The underpinnings of AttentionCode are three architectural innovations: AttentionNet, input restructuring, and adaptation to fading channels, accompanied by several training methods, including large-batch training, distributed learning, look-ahead optimizer, training-test signal-to-noise ratio (SNR) mismatch, and curriculum learning. The training methods can potentially be generalized to other wireless communication applications with machine learning. Numerical experiments verify that AttentionCode establishes a new state of the art among all DL-based feedback codes in both additive white Gaussian noise (AWGN) channels and fading channels. In AWGN channels with noiseless feedback, for example, AttentionCode achieves a block error rate (BLER) of $10^{-7}$ when the forward channel SNR is 0 dB for a block size of 50 bits, demonstrating the potential of AttentionCode to provide ultra-reliable short-packet communications.
ITNov 3, 2022
Feedback is Good, Active Feedback is Better: Block Attention Active Feedback CodesEmre Ozfatura, Yulin Shao, Amin Ghazanfari et al.
Deep neural network (DNN)-assisted channel coding designs, such as low-complexity neural decoders for existing codes, or end-to-end neural-network-based auto-encoder designs are gaining interest recently due to their improved performance and flexibility; particularly for communication scenarios in which high-performing structured code designs do not exist. Communication in the presence of feedback is one such communication scenario, and practical code design for feedback channels has remained an open challenge in coding theory for many decades. Recently, DNN-based designs have shown impressive results in exploiting feedback. In particular, generalized block attention feedback (GBAF) codes, which utilizes the popular transformer architecture, achieved significant improvement in terms of the block error rate (BLER) performance. However, previous works have focused mainly on passive feedback, where the transmitter observes a noisy version of the signal at the receiver. In this work, we show that GBAF codes can also be used for channels with active feedback. We implement a pair of transformer architectures, at the transmitter and the receiver, which interact with each other sequentially, and achieve a new state-of-the-art BLER performance, especially in the low SNR regime.
NIMar 20, 2022
Federated Spatial Reuse Optimization in Next-Generation Decentralized IEEE 802.11 WLANsFrancesc Wilhelmi, Jernej Hribar, Selim F. Yilmaz et al.
As wireless standards evolve, more complex functionalities are introduced to address the increasing requirements in terms of throughput, latency, security, and efficiency. To unleash the potential of such new features, artificial intelligence (AI) and machine learning (ML) are currently being exploited for deriving models and protocols from data, rather than by hand-programming. In this paper, we explore the feasibility of applying ML in next-generation wireless local area networks (WLANs). More specifically, we focus on the IEEE 802.11ax spatial reuse (SR) problem and predict its performance through federated learning (FL) models. The set of FL solutions overviewed in this work is part of the 2021 International Telecommunication Union (ITU) AI for 5G Challenge.
LGAug 21, 2022
Byzantines can also Learn from History: Fall of Centered Clipping in Federated LearningKerem Ozfatura, Emre Ozfatura, Alptekin Kupcu et al.
The increasing popularity of the federated learning (FL) framework due to its success in a wide range of collaborative learning tasks also induces certain security concerns. Among many vulnerabilities, the risk of Byzantine attacks is of particular concern, which refers to the possibility of malicious clients participating in the learning process. Hence, a crucial objective in FL is to neutralize the potential impact of Byzantine attacks and to ensure that the final model is trustable. It has been observed that the higher the variance among the clients' models/updates, the more space there is for Byzantine attacks to be hidden. As a consequence, by utilizing momentum, and thus, reducing the variance, it is possible to weaken the strength of known Byzantine attacks. The centered clipping (CC) framework has further shown that the momentum term from the previous iteration, besides reducing the variance, can be used as a reference point to neutralize Byzantine attacks better. In this work, we first expose vulnerabilities of the CC framework, and introduce a novel attack strategy that can circumvent the defences of CC and other robust aggregators and reduce their test accuracy up to %33 on best-case scenarios in image classification tasks. Then, we propose a new robust and fast defence mechanism that is effective against the proposed and other existing Byzantine attacks.
LGApr 9, 2024
Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated LearningEmre Ozfatura, Kerem Ozfatura, Alptekin Kupcu et al.
Federated learning (FL) has been introduced to enable a large number of clients, possibly mobile devices, to collaborate on generating a generalized machine learning model thanks to utilizing a larger number of local samples without sharing to offer certain privacy to collaborating clients. However, due to the participation of a large number of clients, it is often difficult to profile and verify each client, which leads to a security threat that malicious participants may hamper the accuracy of the trained model by conveying poisoned models during the training. Hence, the aggregation framework at the parameter server also needs to minimize the detrimental effects of these malicious clients. A plethora of attack and defence strategies have been analyzed in the literature. However, often the Byzantine problem is analyzed solely from the outlier detection perspective, being oblivious to the topology of neural networks (NNs). In the scope of this work, we argue that by extracting certain side information specific to the NN topology, one can design stronger attacks. Hence, inspired by the sparse neural networks, we introduce a hybrid sparse Byzantine attack that is composed of two parts: one exhibiting a sparse nature and attacking only certain NN locations with higher sensitivity, and the other being more silent but accumulating over time, where each ideally targets a different type of defence mechanism, and together they form a strong but imperceptible attack. Finally, we show through extensive simulations that the proposed hybrid Byzantine attack is effective against 8 different defence methods.
DCFeb 24, 2022
Robust Federated Learning with Connectivity Failures: A Semi-Decentralized Framework with Collaborative RelayingMichal Yemini, Rajarshi Saha, Emre Ozfatura et al.
Intermittent connectivity of clients to the parameter server (PS) is a major bottleneck in federated edge learning frameworks. The lack of constant connectivity induces a large generalization gap, especially when the local data distribution amongst clients exhibits heterogeneity. To overcome intermittent communication outages between clients and the central PS, we introduce the concept of collaborative relaying wherein the participating clients relay their neighbors' local updates to the PS in order to boost the participation of clients with poor connectivity to the PS. We propose a semi-decentralized federated learning framework in which at every communication round, each client initially computes a local consensus of a subset of its neighboring clients' updates, and eventually transmits to the PS a weighted average of its own update and those of its neighbors'. We appropriately optimize these local consensus weights to ensure that the global update at the PS is unbiased with minimal variance - consequently improving the convergence rate. Numerical evaluations on the CIFAR-10 dataset demonstrate that our collaborative relaying approach outperforms federated averaging-based benchmarks for learning over intermittently-connected networks such as when the clients communicate over millimeter wave channels with intermittent blockages.
LGDec 7, 2021
Collaborative Learning over Wireless Networks: An Introductory OverviewEmre Ozfatura, Deniz Gunduz, H. Vincent Poor
In this chapter, we will mainly focus on collaborative training across wireless devices. Training a ML model is equivalent to solving an optimization problem, and many distributed optimization algorithms have been developed over the last decades. These distributed ML algorithms provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local. This addresses, to some extend, the privacy concern. They also provide computational scalability as they allow exploiting computational resources distributed across many edge devices. However, in practice, this does not directly lead to a linear gain in the overall learning speed with the number of devices. This is partly due to the communication bottleneck limiting the overall computation speed. Additionally, wireless devices are highly heterogeneous in their computational capabilities, and both their computation speed and communication rate can be highly time-varying due to physical factors. Therefore, distributed learning algorithms, particularly those to be implemented at the wireless network edge, must be carefully designed taking into account the impact of time-varying communication network as well as the heterogeneous and stochastic computation capabilities of devices.
LGJun 18, 2021
Less is More: Feature Selection for Adversarial Robustness with Compressive Counter-Adversarial AttacksEmre Ozfatura, Muhammad Zaid Hameed, Kerem Ozfatura et al.
A common observation regarding adversarial attacks is that they mostly give rise to false activation at the penultimate layer to fool the classifier. Assuming that these activation values correspond to certain features of the input, the objective becomes choosing the features that are most useful for classification. Hence, we propose a novel approach to identify the important features by employing counter-adversarial attacks, which highlights the consistency at the penultimate layer with respect to perturbations on input samples. First, we empirically show that there exist a subset of features, classification based in which bridge the gap between the clean and robust accuracy. Second, we propose a simple yet efficient mechanism to identify those features by searching the neighborhood of input sample. We then select features by observing the consistency of the activation values at the penultimate layer.
ITMar 1, 2021
Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed LearningBaturalp Buyukates, Emre Ozfatura, Sennur Ulukus et al.
Distributed implementations are crucial in speeding up large scale machine learning applications. Distributed gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers. Coded distributed computation techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers. In this paper, we consider gradient coding (GC), and propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to dynamically choose from among a set of possible codes depending on the past straggling behavior. In particular, we consider GC with clustering, and regulate the number of stragglers in each cluster by dynamically forming the clusters at each iteration; hence, the proposed scheme is called $GC$ $with$ $dynamic$ $clustering$ (GC-DC). Under a time-correlated straggling behavior, GC-DC gains from adapting to the straggling behavior over time such that, at each iteration, GC-DC aims at distributing the stragglers across clusters as uniformly as possible based on the past straggler behavior. For both homogeneous and heterogeneous worker models, we numerically show that GC-DC provides significant improvements in the average per-iteration completion time without an increase in the communication load compared to the original GC scheme.
LGJan 21, 2021
Time-Correlated Sparsification for Communication-Efficient Federated LearningEmre Ozfatura, Kerem Ozfatura, Deniz Gunduz
Federated learning (FL) enables multiple clients to collaboratively train a shared model without disclosing their local datasets. This is achieved by exchanging local model updates with the help of a parameter server (PS). However, due to the increasing size of the trained models, the communication load due to the iterative exchanges between the clients and the PS often becomes a bottleneck in the performance. Sparse communication is often employed to reduce the communication load, where only a small subset of the model updates are communicated from the clients to the PS. In this paper, we introduce a novel time-correlated sparsification (TCS) scheme, which builds upon the notion that sparse communication framework can be considered as identifying the most significant elements of the underlying model. Hence, TCS seeks a certain correlation between the sparse representations used at consecutive iterations in FL, so that the overhead due to encoding and transmission of the sparse representation can be significantly reduced without compromising the test accuracy. Through extensive simulations on the CIFAR-10 dataset, we show that TCS can achieve centralized training accuracy with 100 times sparsification, and up to 2000 times reduction in the communication load when employed together with quantization.
LGDec 16, 2020
FedADC: Accelerated Federated Learning with Drift ControlKerem Ozfatura, Emre Ozfatura, Deniz Gunduz
Federated learning (FL) has become de facto framework for collaborative learning among edge devices with privacy concern. The core of the FL strategy is the use of stochastic gradient descent (SGD) in a distributed manner. Large scale implementation of FL brings new challenges, such as the incorporation of acceleration techniques designed for SGD into the distributed setting, and mitigation of the drift problem due to non-homogeneous distribution of local datasets. These two problems have been separately studied in the literature; whereas, in this paper, we show that it is possible to address both problems using a single strategy without any major alteration to the FL framework, or introducing additional computation and communication load. To achieve this goal, we propose FedADC, which is an accelerated FL algorithm with drift control. We empirically illustrate the advantages of FedADC.
LGNov 12, 2020
Distributed Sparse SGD with Majority VotingKerem Ozfatura, Emre Ozfatura, Deniz Gunduz
Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This strategy provides a significant reduction in the communication load and allows using the same sparsity level in both communication directions. Through extensive simulations on the CIFAR-10 dataset, we show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.
ITNov 3, 2020
Gradient Coding with Dynamic Clustering for Straggler MitigationBaturalp Buyukates, Emre Ozfatura, Sennur Ulukus et al.
In distributed synchronous gradient descent (GD) the main performance bottleneck for the per-iteration completion time is the slowest \textit{straggling} workers. To speed up GD iterations in the presence of stragglers, coded distributed computation techniques are implemented by assigning redundant computations to workers. In this paper, we propose a novel gradient coding (GC) scheme that utilizes dynamic clustering, denoted by GC-DC, to speed up the gradient calculation. Under time-correlated straggling behavior, GC-DC aims at regulating the number of straggling workers in each cluster based on the straggler behavior in the previous iteration. We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
SPSep 28, 2020
Communicate to Learn at the EdgeDeniz Gunduz, David Burth Kurka, Mikolaj Jankowski et al.
Bringing the success of modern machine learning (ML) techniques to mobile devices can enable many new services and businesses, but also poses significant technical and research challenges. Two factors that are critical for the success of ML algorithms are massive amounts of data and processing power, both of which are plentiful, yet highly distributed at the network edge. Moreover, edge devices are connected through bandwidth- and power-limited wireless links that suffer from noise, time-variations, and interference. Information and coding theory have laid the foundations of reliable and efficient communications in the presence of channel imperfections, whose application in modern wireless networks have been a tremendous success. However, there is a clear disconnect between the current coding and communication schemes, and the ML algorithms deployed at the network edge. In this paper, we challenge the current approach that treats these problems separately, and argue for a joint communication and learning paradigm for both the training and inference stages of edge learning.
ITJul 4, 2020
Coded Distributed Computing with Partial RecoveryEmre Ozfatura, Sennur Ulukus, Deniz Gunduz
Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed distributed computation scheme with partial recovery in terms of the trade-off between the computation accuracy and latency.
ITJun 2, 2020
Age-Based Coded Computation for Bias Reduction in Distributed LearningEmre Ozfatura, Baturalp Buyukates, Deniz Gunduz et al.
Coded computation can be used to speed up distributed learning in the presence of straggling workers. Partial recovery of the gradient vector can further reduce the computation time at each iteration; however, this can result in biased estimators, which may slow down convergence, or even cause divergence. Estimator bias will be particularly prevalent when the straggling behavior is correlated over time, which results in the gradient estimators being dominated by a few fast servers. To mitigate biased estimators, we design a $timely$ dynamic encoding framework for partial recovery that includes an ordering operator that changes the codewords and computation orders at workers over time. To regulate the recovery frequencies, we adopt an $age$ metric in the design of the dynamic encoding scheme. We show through numerical results that the proposed dynamic encoding strategy increases the timeliness of the recovered computations, which as a result, reduces the bias in model updates, and accelerates the convergence compared to the conventional static partial recovery schemes.
ITApr 10, 2020
Straggler-aware Distributed Learning: Communication Computation Latency Trade-offEmre Ozfatura, Sennur Ulukus, Deniz Gunduz
When gradient descent (GD) is scaled to many parallel workers for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Straggling workers can be tolerated by assigning redundant computations and coding across data and computations, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (PS) after completing all its computations. Imposing such a limitation results in two main drawbacks; over-computation due to inaccurate prediction of the straggling behaviour, and under-utilization due to treating workers as straggler/non-straggler and discarding partial computations carried out by stragglers. In this paper, to overcome these drawbacks, we consider multi-message communication (MMC) by allowing multiple computations to be conveyed from each worker per iteration, and design straggler avoidance techniques accordingly. Then, we analyze how the proposed designs can be employed efficiently to seek a balance between the computation and communication latency to minimize the overall latency. Furthermore, through extensive simulations, both model-based and real implementation on Amazon EC2 servers, we identify the advantages and disadvantages of these designs in different settings, and demonstrate that MMC can help improve upon existing straggler avoidance schemes.
SPMar 6, 2020
Decentralized SGD with Over-the-Air ComputationEmre Ozfatura, Stefano Rini, Deniz Gunduz
We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-to-point (P2P) transmission strategy, termed the OAC-P2P scheme, in which the node pairs are scheduled in an orthogonal fashion to minimize interference. Since in the DSGD framework, each node requires a linear combination of the neighboring models at the consensus step, we then propose the OAC-MAC scheme, which utilizes the signal superposition property of the wireless medium to achieve over-the-air computation (OAC). For both schemes, we cast the scheduling problem as a graph coloring problem. We numerically evaluate the performance of these two schemes for the MNIST image classification task under various network conditions. We show that the OAC-MAC scheme attains better convergence performance with a fewer communication rounds.
LGSep 5, 2019
Hierarchical Federated Learning Across Heterogeneous Cellular NetworksMehdi Salehi Heydar Abad, Emre Ozfatura, Deniz Gunduz et al.
We study collaborative machine learning (ML) across wireless devices, each with its own local dataset. Offloading these datasets to a cloud or an edge server to implement powerful ML solutions is often not feasible due to latency, bandwidth and privacy constraints. Instead, we consider federated edge learning (FEEL), where the devices share local updates on the model parameters rather than their datasets. We consider a heterogeneous cellular network (HCN), where small cell base stations (SBSs) orchestrate FL among the mobile users (MUs) within their cells, and periodically exchange model updates with the macro base station (MBS) for global consensus. We employ gradient sparsification and periodic averaging to increase the communication efficiency of this hierarchical federated learning (FL) framework. We then show using CIFAR-10 dataset that the proposed hierarchical learning solution can significantly reduce the communication latency without sacrificing the model accuracy.
ITMar 5, 2019
Gradient Coding with Clustering and Multi-message CommunicationEmre Ozfatura, Deniz Gunduz, Sennur Ulukus
Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing servers (i.e., workers) to speed up GD iterations. While distributed computing can increase the computation speed significantly, the per-iteration completion time is limited by the slowest straggling workers. Coded distributed computing can mitigate straggling workers by introducing redundant computations; however, existing coded computing schemes are mainly designed against persistent stragglers, and partial computations at straggling workers are discarded, leading to wasted computational capacity. In this paper, we propose a novel gradient coding (GC) scheme which allows multiple coded computations to be conveyed from each worker to the master per iteration. We numerically show that the proposed GC with multi-message communication (MMC) together with clustering provides significant improvements in the average completion time (of each iteration), with minimal or no increase in the communication load.
LGNov 22, 2018
Distributed Gradient Descent with Coded Partial Gradient ComputationsEmre Ozfatura, Sennur Ulukus, Deniz Gunduz
Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling servers; and they are typically designed to recover the full gradient, and thus, cannot provide a balance between the accuracy of the gradient and per-iteration completion time. Here we introduce a hybrid approach, called coded partial gradient computation (CPGC), that benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and decoding complexity.
ITAug 7, 2018
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent StragglersEmre Ozfatura, Deniz Gunduz, Sennur Ulukus
Distributed gradient descent (DGD) is an efficient way of implementing gradient descent (GD), especially for large data sets, by dividing the computation tasks into smaller subtasks and assigning to different computing servers (CSs) to be executed in parallel. In standard parallel execution, per-iteration waiting time is limited by the execution time of the straggling servers. Coded DGD techniques have been introduced recently, which can tolerate straggling servers via assigning redundant computation tasks to the CSs. In most of the existing DGD schemes, either with coded computation or coded communication, the non-straggling CSs transmit one message per iteration once they complete all their assigned computation tasks. However, although the straggling servers cannot complete all their assigned tasks, they are often able to complete a certain portion of them. In this paper, we allow multiple transmissions from each CS at each iteration in order to make sure a maximum number of completed computations can be reported to the aggregating server (AS), including the straggling servers. We numerically show that the average completion time per iteration can be reduced significantly by slightly increasing the communication load per server.
ITJul 6, 2018
Delay-Aware Coded Caching for Mobile UsersEmre Ozfatura, Thomas Rarris, Deniz Gunduz et al.
In this work, we study the trade-off between the cache capacity and the user delay for a cooperative Small Base Station (SBS) coded caching system with mobile users. First, a delay-aware coded caching policy, which takes into account the popularity of the files and the maximum re-buffering delay to minimize the average rebuffering delay of a mobile user under a given cache capacity constraint is introduced. Subsequently, we address a scenario where some files are served by the macro-cell base station (MBS) when the cache capacity of the SBSs is not sufficient to store all the files in the library. For this scenario, we develop a coded caching policy that minimizes the average amount of data served by the MBS under an average re-buffering delay constraint.
NIMar 9, 2017
Optimal Network-Assisted Multi-user DASH Video StreamingEmre Ozfatura, Ozgur Ercetin, Hazer Inaltekin
Streaming video is becoming the predominant type of traffic over the Internet with reports forecasting the video content to account for 80% of all traffic by 2019. With significant investment on Internet backbone, the main bottleneck remains at the edge servers (e.g., WiFi access points, small cells, etc.). In this work, we propose and prove the optimality of a multiuser resource allocation mechanism operating at the edge server that minimizes the probability of stalling of video streams due to buffer under-flows. Our proposed policy utilizes Media Presentation Description (MPD) files of clients that are sent in compliant to Dynamic Adaptive Streaming over HTTP (DASH) protocol to be cognizant of the deadlines of each of the media file to be displayed by the clients. Then, the policy schedules the users in the order of their deadlines. After establishing the optimality of this policy to minimize the stalling probability for a network with links associated with fixed loss rates, the utility of the algorithm is verified under realistic network conditions with detailed NS-3 simulations.