Ariel Orda

DC
h-index6
4papers
63citations
Novelty59%
AI Score36

4 Papers

DCJul 2, 2025
Distributed Training under Packet Loss

Erez Weintraub, Ron Banner, Ariel Orda

State-of-the-art language and vision models are routinely trained across thousands of GPUs, often spanning multiple data-centers, yet today's distributed frameworks still assume reliable connections (e.g., InfiniBand or RoCE). The resulting acknowledgment traffic and retransmissions inflate tail latencies and limit scalability. Leveraging unreliable connections will reduce latency but may sacrifice model accuracy and convergence once packets are dropped. A principled, end-to-end solution that preserves accuracy and convergence guarantees under genuine packet loss has previously been missing. We address this critical gap by introducing a novel distributed training framework capable of operating over unreliable connections, offering unbiased gradient aggregation and bounded parameter drift without modifying model code or optimizers. The key insight is a two-stage defense against missing messages: (i) Unbiased gradient aggregation: each worker reconstructs a consistent gradient estimate from whatever packets arrive, guaranteeing expectation-level correctness; and (ii) Bounded-drift parameter broadcasts: we prove the inter-worker model discrepancy remains O(1) even after arbitrarily many iterations, preventing the unbounded divergence typical of asynchronous setups. Analytical bounds are matched by experiments on the LLAMA2 7B model with 64 GPUs: tolerating 10% random packet loss yields at most 0.8% perplexity change. This work bridges the gap between communication-efficient datacenter protocols and the accuracy and generalization guarantees demanded by modern large-model training, enabling robust, high-throughput learning on commodity or wide-area networks.

LGSep 26, 2019
RADE: Resource-Efficient Supervised Anomaly Detection Using Decision Tree-Based Ensemble Methods

Shay Vargaftik, Isaac Keslassy, Ariel Orda et al.

Decision-tree-based ensemble classification methods (DTEMs) are a prevalent tool for supervised anomaly detection. However, due to the continued growth of datasets, DTEMs result in increasing drawbacks such as growing memory footprints, longer training times, and slower classification latencies at lower throughput. In this paper, we present, design, and evaluate RADE - a DTEM-based anomaly detection framework that augments standard DTEM classifiers and alleviates these drawbacks by relying on two observations: (1) we find that a small (coarse-grained) DTEM model is sufficient to classify the majority of the classification queries correctly, such that a classification is valid only if its corresponding confidence level is greater than or equal to a predetermined classification confidence threshold; (2) we find that in these fewer harder cases where our coarse-grained DTEM model results in insufficient confidence in its classification, we can improve it by forwarding the classification query to one of expert DTEM (fine-grained) models, which is explicitly trained for that particular case. We implement RADE in Python based on scikit-learn and evaluate it over different DTEM methods: RF, XGBoost, AdaBoost, GBDT and LightGBM, and over three publicly available datasets. Our evaluation over both a strong AWS EC2 instance and a Raspberry Pi 3 device indicates that RADE offers competitive and often superior anomaly detection capabilities as compared to standard DTEM methods, while significantly improving memory footprint (by up to 5.46x), training-time (by up to 17.2x), and classification latency (by up to 31.2x).

SIFeb 6, 2014
Localized epidemic detection in networks with overwhelming noise

Eli A. Meirom, Chris Milling, Constantine Caramanis et al.

We consider the problem of detecting an epidemic in a population where individual diagnoses are extremely noisy. The motivation for this problem is the plethora of examples (influenza strains in humans, or computer viruses in smartphones, etc.) where reliable diagnoses are scarce, but noisy data plentiful. In flu/phone-viruses, exceedingly few infected people/phones are professionally diagnosed (only a small fraction go to a doctor) but less reliable secondary signatures (e.g., people staying home, or greater-than-typical upload activity) are more readily available. These secondary data are often plagued by unreliability: many people with the flu do not stay home, and many people that stay home do not have the flu. This paper identifies the precise regime where knowledge of the contact network enables finding the needle in the haystack: we provide a distributed, efficient and robust algorithm that can correctly identify the existence of a spreading epidemic from highly unreliable local data. Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available. Our results show it succeeds in the presence of partial information about the contact network, and also when there is not a single "patient zero", but rather many (hundreds, in our examples) of initial patient-zeroes, spread across the graph.

NIApr 9, 2013
Efficient Wireless Security Through Jamming, Coding and Routing

Majid Ghaderi, Dennis Goeckel, Ariel Orda et al.

There is a rich recent literature on how to assist secure communication between a single transmitter and receiver at the physical layer of wireless networks through techniques such as cooperative jamming. In this paper, we consider how these single-hop physical layer security techniques can be extended to multi-hop wireless networks and show how to augment physical layer security techniques with higher layer network mechanisms such as coding and routing. Specifically, we consider the secure minimum energy routing problem, in which the objective is to compute a minimum energy path between two network nodes subject to constraints on the end-to-end communication secrecy and goodput over the path. This problem is formulated as a constrained optimization of transmission power and link selection, which is proved to be NP-hard. Nevertheless, we show that efficient algorithms exist to compute both exact and approximate solutions for the problem. In particular, we develop an exact solution of pseudo-polynomial complexity, as well as an epsilon-optimal approximation of polynomial complexity. Simulation results are also provided to show the utility of our algorithms and quantify their energy savings compared to a combination of (standard) security-agnostic minimum energy routing and physical layer security. In the simulated scenarios, we observe that, by jointly optimizing link selection at the network layer and cooperative jamming at the physical layer, our algorithms reduce the network energy consumption by half.