LGOct 20, 2022
HesScale: Scalable Computation of Hessian DiagonalsMohamed Elsayed, A. Rupam Mahmood
Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a scalable way. The absence of efficient ways of computation drove the most widely used methods to focus on first-order approximations that do not capture the curvature information. In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order information in a computationally efficient manner. We show that HesScale has the same computational complexity as backpropagation. Our results on supervised classification show that HesScale achieves high approximation accuracy, allowing for scalable and efficient second-order optimization.
LGFeb 7, 2023
Utility-based Perturbed Gradient Descent: An Optimizer for Continual LearningMohamed Elsayed, A. Rupam Mahmood
Modern representation learning methods often struggle to adapt quickly under non-stationarity because they suffer from catastrophic forgetting and decaying plasticity. Such problems prevent learners from fast adaptation since they may forget useful features or have difficulty learning new ones. Hence, these methods are rendered ineffective for continual learning. This paper proposes Utility-based Perturbed Gradient Descent (UPGD), an online learning algorithm well-suited for continual learning agents. UPGD protects useful weights or features from forgetting and perturbs less useful ones based on their utilities. Our empirical results show that UPGD helps reduce forgetting and maintain plasticity, enabling modern representation learning methods to work effectively in continual learning.
35.2LGMay 6
Extending Differential Temporal Difference Methods for Episodic ProblemsKris De Asis, Mohamed Elsayed, Jiamin He
Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bounded and removes a value function's state-independent offset. However, reward centering can alter the optimal policy in episodic problems, limiting its applicability. Motivated by recent works that emphasize the role of normalization in streaming deep reinforcement learning, we study reward centering in episodic problems and propose a generalization of differential TD. We prove that this generalization maintains the ordering of policies in the presence of termination, and thus extends differential TD to episodic problems. We show equivalence with a form of linear TD, thereby inheriting theoretical guarantees that have been shown for those algorithms. We then extend several streaming reinforcement learning algorithms to their differential counterparts. Across a range of base algorithms and environments, we empirically validate that reward centering can improve sample efficiency in episodic problems.
LGJul 1, 2024
Weight Clipping for Deep Continual and Reinforcement LearningMohamed Elsayed, Qingfeng Lan, Clare Lyle et al.
Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures that are associated with increasing weight norm and we propose a simple technique that can be easily added on top of existing learning systems: clipping neural network weights to limit them to a specific range. We study the effectiveness of weight clipping in a series of supervised and reinforcement learning experiments. Our empirical results highlight the benefits of weight clipping for generalization, addressing loss of plasticity and policy collapse, and facilitating learning with a large replay ratio.
IVNov 14, 2025
A Deep Learning Framework for Thyroid Nodule Segmentation and Malignancy Classification from Ultrasound ImagesOmar Abdelrazik, Mohamed Elsayed, Noorul Wahab et al.
Ultrasound-based risk stratification of thyroid nodules is a critical clinical task, but it suffers from high inter-observer variability. While many deep learning (DL) models function as "black boxes," we propose a fully automated, two-stage framework for interpretable malignancy prediction. Our method achieves interpretability by forcing the model to focus only on clinically relevant regions. First, a TransUNet model automatically segments the thyroid nodule. The resulting mask is then used to create a region of interest around the nodule, and this localised image is fed directly into a ResNet-18 classifier. We evaluated our framework using 5-fold cross-validation on a clinical dataset of 349 images, where it achieved a high F1-score of 0.852 for predicting malignancy. To validate its performance, we compared it against a strong baseline using a Random Forest classifier with hand-crafted morphological features, which achieved an F1-score of 0.829. The superior performance of our DL framework suggests that the implicit visual features learned from the localised nodule are more predictive than explicit shape features alone. This is the first fully automated end-to-end pipeline for both detecting thyroid nodules on ultrasound images and predicting their malignancy.
35.2LGApr 21
Intentional Updates for Streaming Reinforcement LearningArsalan Sharifnassab, Mohamed Elsayed, Kris De Asis et al.
In gradient-based learning, a step size chosen in parameter units does not produce a predictable per-step change in function output. This often leads to instability in the streaming setting (i.e., batch size=1), where stochasticity is not averaged out and update magnitudes can momentarily become arbitrarily big or small. Instead, we propose intentional updates: first specify the intended outcome of an update and then solve for the step size that approximately achieves it. This strategy has precedent in online supervised linear regression via Normalized Least Mean Squares algorithm, which selects a step size to yield a specified change in the function output proportional to the current error. We extend this principle to streaming deep reinforcement learning by defining appropriate intended outcomes: Intentional TD aims for a fixed fractional reduction of the TD error, and Intentional Policy Gradient aims for a bounded per-step change in the policy, limiting local KL divergence. We propose practical algorithms combining eligibility traces and diagonal scaling. Empirically, these methods yield state-of-the-art streaming performance, frequently performing on par with batch and replay-buffer approaches.
MAOct 19, 2020Code
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous DrivingMing Zhou, Jun Luo, Julian Villella et al.
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS.
LGMar 31, 2024
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual LearningMohamed Elsayed, A. Rupam Mahmood
Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units. While many methods address these two issues separately, only a few currently deal with both simultaneously. In this paper, we introduce Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for the continual learning of representations. UPGD combines gradient updates with perturbations, where it applies smaller modifications to more useful units, protecting them from forgetting, and larger modifications to less useful units, rejuvenating their plasticity. We use a challenging streaming learning setup where continual learning problems have hundreds of non-stationarities and unknown task boundaries. We show that many existing methods suffer from at least one of the issues, predominantly manifested by their decreasing accuracy over tasks. On the other hand, UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. Finally, in extended reinforcement learning experiments with PPO, we show that while Adam exhibits a performance drop after initial learning, UPGD avoids it by addressing both continual learning issues.
45.8CRApr 29
An Empirical Security Evaluation of LLM-Generated Cryptographic Rust CodeMohamed Elsayed, Kenneth Fulton, Jeong Yang
Developers and organizations are using Large Language Models (LLMs) to generate security-critical code more frequently than ever, including cryptographic solutions for their products. This study presents an empirical evaluation of cryptographic security in 240 Rust code samples for two crypto algorithms (AES-256-GCM and ChaCha20-Poly1305) generated by three LLMs (Gemini 2.5 Pro, GPT-4o, and DeepSeek Coder) using four different prompt strategies. For each successfully compiled code sample, CodeQL static analysis and our rule-based crypto-specific analyzer were used to detect vulnerabilities, which are also associated with Common Weakness Enumeration (CWE). The evaluation results revealed that only 23.3% of the generated code samples were successfully compiled. Among the compiled code, CodeQL produced only two false positives, while our rule-based crypto-specific analyzer identified vulnerabilities in 57% of the compiled samples with zero false positives. This demonstrates that general-purpose analysis tools are insufficient for code validation for the experimented crypto algorithms. The compilation success of the two algorithms varied significantly (AES-256-GCM 34.2% versus ChaCha20-Poly1305 12.5%), showing a gap in code generation capabilities. While model choice had no significant effect on compilation success, prompt strategy significantly influenced outcomes (P = 0.002), with chain-of-thought prompting performing 5 times worse than zero-shot. All three models exhibit systematic failures, including nonce reuse and API hallucinations.
LGOct 18, 2024
Streaming Deep Reinforcement Learning Finally WorksMohamed Elsayed, Gautham Vasan, A. Rupam Mahmood
Natural intelligence processes experience as a continuous stream, sensing, acting, and learning moment-by-moment in real time. Streaming learning, the modus operandi of classic reinforcement learning (RL) algorithms like Q-learning and TD, mimics natural learning by using the most recent sample without storing it. This approach is also ideal for resource-constrained, communication-limited, and privacy-sensitive applications. However, in deep RL, learners almost always use batch updates and replay buffers, making them computationally expensive and incompatible with streaming learning. Although the prevalence of batch deep RL is often attributed to its sample efficiency, a more critical reason for the absence of streaming deep RL is its frequent instability and failure to learn, which we refer to as stream barrier. This paper introduces the stream-x algorithms, the first class of deep RL algorithms to overcome stream barrier for both prediction and control and match sample efficiency of batch RL. Through experiments in Mujoco Gym, DM Control Suite, and Atari Games, we demonstrate stream barrier in existing algorithms and successful stable learning with our stream-x algorithms: stream Q, stream AC, and stream TD, achieving the best model-free performance in DM Control Dog environments. A set of common techniques underlies the stream-x algorithms, enabling their success with a single set of hyperparameters and allowing for easy extension to other algorithms, thereby reviving streaming RL.
LGNov 22, 2024
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay BuffersGautham Vasan, Mohamed Elsayed, Alireza Azimi et al.
Modern deep policy gradient methods achieve effective performance on simulated robotic tasks, but they all require large replay buffers or expensive batch updates, or both, making them incompatible for real systems with resource-limited computers. We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer. We propose a novel incremental deep policy gradient method -- Action Value Gradient (AVG) and a set of normalization and scaling techniques to address the challenges of instability in incremental learning. On robotic simulation benchmarks, we show that AVG is the only incremental method that learns effectively, often achieving final performance comparable to batch policy gradient methods. This advancement enabled us to show for the first time effective deep reinforcement learning with real robots using only incremental updates, employing a robotic manipulator and a mobile robot.
LGJun 5, 2024
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement LearningMohamed Elsayed, Homayoon Farrahi, Felix Dangel et al.
Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and LeCun (1989, BL89), which has a cost similar to gradients and appears to have been overlooked by the community. We introduce HesScale, an improvement over BL89, which adds negligible extra computation. On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute. We use this insight in reinforcement learning problems where small networks are used and demonstrate HesScale in second-order optimization and scaling the step-size parameter. In our experiments, HesScale optimizes faster than existing methods and improves stability through step-size scaling. These findings are promising for scaling second-order methods in larger models in the future.
SPOct 18, 2021
Hybrid-Layers Neural Network Architectures for Modeling the Self-Interference in Full-Duplex SystemsMohamed Elsayed, Ahmad A. Aziz El-Banna, Octavia A. Dobre et al.
Full-duplex (FD) systems have been introduced to provide high data rates for beyond fifth-generation wireless networks through simultaneous transmission of information over the same frequency resources. However, the operation of FD systems is practically limited by the self-interference (SI), and efficient SI cancelers are sought to make the FD systems realizable. Typically, polynomial-based cancelers are employed to mitigate the SI; nevertheless, they suffer from high complexity. This article proposes two novel hybrid-layers neural network (NN) architectures to cancel the SI with low complexity. The first architecture is referred to as hybrid-convolutional recurrent NN (HCRNN), whereas the second is termed as hybrid-convolutional recurrent dense NN (HCRDNN). In contrast to the state-of-the-art NNs that employ dense or recurrent layers for SI modeling, the proposed NNs exploit, in a novel manner, a combination of different hidden layers (e.g., convolutional, recurrent, and/or dense) in order to model the SI with lower computational complexity than the polynomial and the state-of-the-art NN-based cancelers. The key idea behind using hybrid layers is to build an NN model, which makes use of the characteristics of the different layers employed in its architecture. More specifically, in the HCRNN, a convolutional layer is employed to extract the input data features using a reduced network scale. Moreover, a recurrent layer is then applied to assist in learning the temporal behavior of the input signal from the localized feature map of the convolutional layer. In the HCRDNN, an additional dense layer is exploited to add another degree of freedom for adapting the NN settings in order to achieve the best compromise between the cancellation performance and computational complexity. Complexity analysis and numerical simulations are provided to prove the superiority of the proposed architectures.
ROMar 8, 2021
Autonomous object harvesting using synchronized optoelectronic microrobotsChristopher Bendkowski, Laurent Mennillo, Tao Xu et al.
Optoelectronic tweezer-driven microrobots (OETdMs) are a versatile micromanipulation technology based on the use of light induced dielectrophoresis to move small dielectric structures (microrobots) across a photoconductive substrate. The microrobots in turn can be used to exert forces on secondary objects and carry out a wide range of micromanipulation operations, including collecting, transporting and depositing microscopic cargos. In contrast to alternative (direct) micromanipulation techniques, OETdMs are relatively gentle, making them particularly well suited to interacting with sensitive objects such as biological cells. However, at present such systems are used exclusively under manual control by a human operator. This limits the capacity for simultaneous control of multiple microrobots, reducing both experimental throughput and the possibility of cooperative multi-robot operations. In this article, we describe an approach to automated targeting and path planning to enable open-loop control of multiple microrobots. We demonstrate the performance of the method in practice, using microrobots to simultaneously collect, transport and deposit silica microspheres. Using computational simulations based on real microscopic image data, we investigate the capacity of microrobots to collect target cells from within a dissociated tissue culture. Our results indicate the feasibility of using OETdMs to autonomously carry out micromanipulation tasks within complex, unstructured environments.
SPSep 23, 2020
Low Complexity Neural Network Structures for Self-Interference Cancellation in Full-Duplex RadioMohamed Elsayed, Ahmad A. Aziz El-Banna, Octavia A. Dobre et al.
Self-interference (SI) is considered as a main challenge in full-duplex (FD) systems. Therefore, efficient SI cancelers are required for the influential deployment of FD systems in beyond fifth-generation wireless networks. Existing methods for SI cancellation have mostly considered the polynomial representation of the SI signal at the receiver. These methods are shown to operate well in practice while requiring high computational complexity. Alternatively, neural networks (NNs) are envisioned as promising candidates for modeling the SI signal with reduced computational complexity. Consequently, in this paper, two novel low complexity NN structures, referred to as the ladder-wise grid structure (LWGS) and moving-window grid structure (MWGS), are proposed. The core idea of these two structures is to mimic the non-linearity and memory effect introduced to the SI signal in order to achieve proper SI cancellation while exhibiting low computational complexity. The simulation results reveal that the LWGS and MWGS NN-based cancelers attain the same cancellation performance of the polynomial-based canceler while providing 49.87% and 34.19% complexity reduction, respectively.