Erik Elmroth

LG
h-index13
8papers
23citations
Novelty51%
AI Score51

8 Papers

5.3DCMay 24
LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics

Eunil Seo, Chanh Nguyen, Erik Elmroth

Hybrid cloud-edge infrastructures now support latency-critical workloads ranging from autonomous vehicles and surgical robotics to immersive AR/VR. However, they continue to experience crippling long-tail latency spikes whenever bursty request streams exceed the capacity of heterogeneous edge and cloud tiers. To address these long-tail latency issues, we present Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling (LA-IMR). This control layer integrates a closed-form, utilization-driven latency model with event-driven scheduling, replica autoscaling, and edge-to-cloud offloading to mitigate 99th-percentile (P99) delays. Our analytic model decomposes end-to-end latency into processing, network, and queuing components, expressing inference latency as an affine power-law function of instance utilization. Once calibrated, it produces two complementary functions that drive: (i) millisecond-scale routing decisions for traffic offloading, and (ii) capacity planning that jointly determines replica pool sizes. LA-IMR enacts these decisions through a quality-differentiated, multi-queue scheduler and a custom-metric Kubernetes autoscaler that scales replicas proactively -- before queues build up -- rather than reactively based on lagging CPU metrics. Across representative vision workloads (YOLOv5m and EfficientDet) and bursty arrival traces, LA-IMR reduces P99 latency by up to 20.7 percent compared to traditional latency-only autoscaling, laying a principled foundation for next-generation, tail-tolerant cloud-edge inference services.

55.9ROMar 19
Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

Chanh Nguyen, Shutong Jin, Florian T. Pokorny et al.

Cloud robotics enables robots to offload high-dimensional motion planning and reasoning to remote servers. However, for continuous manipulation tasks requiring high-frequency control, network latency and jitter can severely destabilize the system, causing command starvation and unsafe physical execution. To address this, we propose Speculative Policy Orchestration (SPO), a latency-resilient cloud-edge framework. SPO utilizes a cloud-hosted world model to pre-compute and stream future kinematic waypoints to a local edge buffer, decoupling execution frequency from network round-trip time. To mitigate unsafe execution caused by predictive drift, the edge node employs an $ε$-tube verifier that strictly bounds kinematic execution errors. The framework is coupled with an Adaptive Horizon Scaling mechanism that dynamically expands or shrinks the speculative pre-fetch depth based on real-time tracking error. We evaluate SPO on continuous RLBench manipulation tasks under emulated network delays. Results show that even when deployed with learned models of modest accuracy, SPO reduces network-induced idle time by over 60% compared to blocking remote inference. Furthermore, SPO discards approximately 60% fewer cloud predictions than static caching baselines. Ultimately, SPO enables fluid, real-time cloud-robotic control while maintaining bounded physical safety.

67.5DCMar 30
Trust-Aware Routing for Distributed Generative AI Inference at the Edge

Chanh Nguyen, Erik Elmroth

Emerging deployments of Generative AI increasingly execute inference across decentralized and heterogeneous edge devices rather than on a single trusted server. In such environments, a single device failure or misbehavior can disrupt the entire inference process, making traditional best-effort peer-to-peer routing insufficient. Coordinating distributed generative inference therefore requires mechanisms that explicitly account for reliability, performance variability, and trust among participating peers. In this paper, we present G-TRAC, a trust-aware coordination framework that integrates algorithmic path selection with system-level protocol design to ensure robust distributed inference. First, we formulate the routing problem as a \textit{Risk-Bounded Shortest Path} computation and introduce a polynomial-time solution that combines trust-floor pruning with Dijkstra's search, achieving sub-millisecond median routing latency at practical edge scales, and remaining below 10 ms at larger scales. Second, to operationally support the routing logic in dynamic environments, the framework employs a \textit{Hybrid Trust Architecture} that maintains global reputation state at stable anchors while disseminating lightweight updates to edge peers via background synchronization. Experimental evaluation on a heterogeneous testbed of commodity devices demonstrates that G-TRAC significantly improves inference completion rates, effectively isolates unreliable peers, and sustains robust execution even under node failures and network partitions.

CRSep 11, 2024
A Cost-Aware Approach to Adversarial Robustness in Neural Networks

Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy Löfstedt et al.

Considering the growing prominence of production-level AI and the threat of adversarial attacks that can evade a model at run-time, evaluating the robustness of models to these evasion attacks is of critical importance. Additionally, testing model changes likely means deploying the models to (e.g. a car or a medical imaging device), or a drone to see how it affects performance, making un-tested changes a public problem that reduces development speed, increases cost of development, and makes it difficult (if not impossible) to parse cause from effect. In this work, we used survival analysis as a cloud-native, time-efficient and precise method for predicting model performance in the presence of adversarial noise. For neural networks in particular, the relationships between the learning rate, batch size, training time, convergence time, and deployment cost are highly complex, so researchers generally rely on benchmark datasets to assess the ability of a model to generalize beyond the training data. To address this, we propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy by using adversarial attacks to induce failures on a reference model architecture before deploying the model to the real world. We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously. This provides a way to evaluate the model and optimise it in a single step, while simultaneously allowing us to model the effect of model parameters on training time, prediction time, and accuracy. Using this technique, we demonstrate that newer, more-powerful hardware does decrease the training time, but with a monetary and power cost that far outpaces the marginal gains in accuracy.

LGJul 14, 2025
MTF-Grasp: A Multi-tier Federated Learning Approach for Robotic Grasping

Obaidullah Zaland, Erik Elmroth, Monowar Bhuyan

Federated Learning (FL) is a promising machine learning paradigm that enables participating devices to train privacy-preserved and collaborative models. FL has proven its benefits for robotic manipulation tasks. However, grasping tasks lack exploration in such settings where robots train a global model without moving data and ensuring data privacy. The main challenge is that each robot learns from data that is nonindependent and identically distributed (non-IID) and of low quantity. This exhibits performance degradation, particularly in robotic grasping. Thus, in this work, we propose MTF-Grasp, a multi-tier FL approach for robotic grasping, acknowledging the unique challenges posed by the non-IID data distribution across robots, including quantitative skewness. MTF-Grasp harnesses data quality and quantity across robots to select a set of "top-level" robots with better data distribution and higher sample count. It then utilizes top-level robots to train initial seed models and distribute them to the remaining "low-level" robots, reducing the risk of model performance degradation in low-level robots. Our approach outperforms the conventional FL setup by up to 8% on the quantity-skewed Cornell and Jacquard grasping datasets.

LGMar 6
Tiny, Hardware-Independent, Compression-based Classification

Charles Meyers, Aaron MacSween, Erik Elmroth et al.

The recent developments in machine learning have highlighted a conflict between online platforms and their users in terms of privacy. The importance of user privacy and the struggle for power over user data has been intensified as regulators and operators attempt to police online platforms. As users have become increasingly aware of privacy issues, client-side data storage, management, and analysis have become a favoured approach to large-scale centralised machine learning. However, state-of-the-art machine learning methods require vast amounts of labelled user data, making them unsuitable for models that reside client-side and only have access to a single user's data. State-of-the-art methods are also computationally expensive, which degrades the user experience on compute-limited hardware and also reduces battery life. A recent alternative approach has proven remarkably successful in classification tasks across a wide variety of data -- using a compression-based distance measure (called normalised compression distance) to measure the distance between generic objects in classical distance-based machine learning methods. In this work, we demonstrate that the normalised compression distance is actually not a metric; develop it for the wider context of kernel methods to allow modelling of complex data; and present techniques to improve the training time of models that use this distance measure. We demonstrate that the normalised compression distance works as well as and sometimes better than other metrics and kernels -- while requiring only marginally more computational costs and in spite of the lack of formal metric properties. The end results is a simple model with remarkable accuracy even when trained on a very small number of samples allowing for models that are small and effective enough to run entirely on a client device using only user-supplied data.

LGJan 24, 2024
A Training Rate and Survival Heuristic for Inference and Robustness Evaluation (TRASHFIRE)

Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy Löfstedt et al.

Machine learning models -- deep neural networks in particular -- have performed remarkably well on benchmark datasets across a wide variety of domains. However, the ease of finding adversarial counter-examples remains a persistent problem when training times are measured in hours or days and the time needed to find a successful adversarial counter-example is measured in seconds. Much work has gone into generating and defending against these adversarial counter-examples, however the relative costs of attacks and defences are rarely discussed. Additionally, machine learning research is almost entirely guided by test/train metrics, but these would require billions of samples to meet industry standards. The present work addresses the problem of understanding and predicting how particular model hyper-parameters influence the performance of a model in the presence of an adversary. The proposed approach uses survival models, worst-case examples, and a cost-aware analysis to precisely and accurately reject a particular model change during routine model training procedures rather than relying on real-world deployment, expensive formal verification methods, or accurate simulations of very complicated systems (\textit{e.g.}, digitally recreating every part of a car or a plane). Through an evaluation of many pre-processing techniques, adversarial counter-examples, and neural network configurations, the conclusion is that deeper models do offer marginal gains in survival times compared to more shallow counterparts. However, we show that those gains are driven more by the model inference time than inherent robustness properties. Using the proposed methodology, we show that ResNet is hopelessly insecure against even the simplest of white box attacks.

LGApr 8, 2020
Resource Management for Blockchain-enabled Federated Learning: A Deep Reinforcement Learning Approach

Nguyen Quang Hieu, Tran The Anh, Nguyen Cong Luong et al.

Blockchain-enabled Federated Learning (BFL) enables mobile devices to collaboratively train neural network models required by a Machine Learning Model Owner (MLMO) while keeping data on the mobile devices. Then, the model updates are stored in the blockchain in a decentralized and reliable manner. However, the issue of BFL is that the mobile devices have energy and CPU constraints that may reduce the system lifetime and training efficiency. The other issue is that the training latency may increase due to the blockchain mining process. To address these issues, the MLMO needs to (i) decide how much data and energy that the mobile devices use for the training and (ii) determine the block generation rate to minimize the system latency, energy consumption, and incentive cost while achieving the target accuracy for the model. Under the uncertainty of the BFL environment, it is challenging for the MLMO to determine the optimal decisions. We propose to use the Deep Reinforcement Learning (DRL) to derive the optimal decisions for the MLMO.