Amr Gomaa

h-index9

17papers

204citations

Novelty41%

AI Score35

Ranked #105,177 of 194,257 authors (top 54%)#867 in HC (top 35%)

17 Papers

10.0HCNov 7, 2022Code

Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces

Amr Gomaa

With the recently increasing capabilities of modern vehicles, novel approaches for interaction emerged that go beyond traditional touch-based and voice command approaches. Therefore, hand gestures, head pose, eye gaze, and speech have been extensively investigated in automotive applications for object selection and referencing. Despite these significant advances, existing approaches mostly employ a one-model-fits-all approach unsuitable for varying user behavior and individual differences. Moreover, current referencing approaches either consider these modalities separately or focus on a stationary situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. In this paper, I propose a research plan for a user-centered adaptive multimodal fusion approach for referencing external objects from a moving vehicle. The proposed plan aims to provide an open-source framework for user-centered adaptation and personalization using user observations and heuristics, multimodal fusion, clustering, transfer-of-learning for model adaptation, and continuous learning, moving towards trusted human-centered artificial intelligence.

8.6HCAug 10, 2022Code

What's on your mind? A Mental and Perceptual Load Estimation Framework towards Adaptive In-vehicle Interaction while Driving

Amr Gomaa, Alexandra Alles, Elena Meiser et al.

Several researchers have focused on studying driver cognitive behavior and mental load for in-vehicle interaction while driving. Adaptive interfaces that vary with mental and perceptual load levels could help in reducing accidents and enhancing the driver experience. In this paper, we analyze the effects of mental workload and perceptual load on psychophysiological dimensions and provide a machine learning-based framework for mental and perceptual load estimation in a dual task scenario for in-vehicle interaction (https://github.com/amrgomaaelhady/MWL-PL-estimator). We use off-the-shelf non-intrusive sensors that can be easily integrated into the vehicle's system. Our statistical analysis shows that while mental workload influences some psychophysiological dimensions, perceptual load shows little effect. Furthermore, we classify the mental and perceptual load levels through the fusion of these measurements, moving towards a real-time adaptive in-vehicle interface that is personalized to user behavior and driving conditions. We report up to 89% mental workload classification accuracy and provide a real-time minimally-intrusive solution.

1.9RONov 29, 2023Code

Toward a Surgeon-in-the-Loop Ophthalmic Robotic Apprentice using Reinforcement and Imitation Learning

Amr Gomaa, Bilal Mahdy, Niko Kleer et al.

Robot-assisted surgical systems have demonstrated significant potential in enhancing surgical precision and minimizing human errors. However, existing systems cannot accommodate individual surgeons' unique preferences and requirements. Additionally, they primarily focus on general surgeries (e.g., laparoscopy) and are unsuitable for highly precise microsurgeries, such as ophthalmic procedures. Thus, we propose an image-guided approach for surgeon-centered autonomous agents that can adapt to the individual surgeon's skill level and preferred surgical techniques during ophthalmic cataract surgery. Our approach trains reinforcement and imitation learning agents simultaneously using curriculum learning approaches guided by image data to perform all tasks of the incision phase of cataract surgery. By integrating the surgeon's actions and preferences into the training process, our approach enables the robot to implicitly learn and adapt to the individual surgeon's unique techniques through surgeon-in-the-loop demonstrations. This results in a more intuitive and personalized surgical experience for the surgeon while ensuring consistent performance for the autonomous robotic apprentice. We define and evaluate the effectiveness of our approach in a simulated environment using our proposed metrics and highlight the trade-off between a generic agent and a surgeon-centered adapted agent. Finally, our approach has the potential to extend to other ophthalmic and microsurgical procedures, opening the door to a new generation of surgeon-in-the-loop autonomous surgical robots. We provide an open-source simulation framework for future development and reproducibility at https://github.com/amrgomaaelhady/CataractAdaptSurgRobot.

3.6ROJul 7, 2023

Teach Me How to Learn: A Perspective Review towards User-centered Neuro-symbolic Learning for Robotic Surgical Systems

Amr Gomaa, Bilal Mahdy, Niko Kleer et al.

Recent advances in machine learning models allowed robots to identify objects on a perceptual nonsymbolic level (e.g., through sensor fusion and natural language understanding). However, these primarily black-box learning models still lack interpretation and transferability and require high data and computational demand. An alternative solution is to teach a robot on both perceptual nonsymbolic and conceptual symbolic levels through hybrid neurosymbolic learning approaches with expert feedback (i.e., human-in-the-loop learning). This work proposes a concept for this user-centered hybrid learning paradigm that focuses on robotic surgical situations. While most recent research focused on hybrid learning for non-robotic and some generic robotic domains, little work focuses on surgical robotics. We survey this related research while focusing on human-in-the-loop surgical robotic systems. This evaluation highlights the most prominent solutions for autonomous surgical robots and the challenges surgeons face when interacting with these systems. Finally, we envision possible ways to address these challenges using online apprenticeship learning based on implicit and explicit feedback from expert surgeons.

16.0CLSep 29, 2023Code

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad et al.

There is an growing interest in using Large Language Models (LLMs) in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.

1.5CVSep 8, 2023Code

SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios

Amr Gomaa, Robin Zitt, Guillermo Reyes et al.

Creating a diverse and comprehensive dataset of hand gestures for dynamic human-machine interfaces in the automotive domain can be challenging and time-consuming. To overcome this challenge, we propose using synthetic gesture datasets generated by virtual 3D models. Our framework utilizes Unreal Engine to synthesize realistic hand gestures, offering customization options and reducing the risk of overfitting. Multiple variants, including gesture speed, performance, and hand shape, are generated to improve generalizability. In addition, we simulate different camera locations and types, such as RGB, infrared, and depth cameras, without incurring additional time and cost to obtain these cameras. Experimental results demonstrate that our proposed framework, SynthoGestures (https://github.com/amrgomaaelhady/SynthoGestures), improves gesture recognition accuracy and can replace or augment real-hand datasets. By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.

18.9CRJun 23

Firewalls to Secure Dynamic LLM Agentic Networks

Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian et al.

The emergence of agent-to-agent communication protocols mirrors the early internet: powerful connectivity with minimal security infrastructure. When AI agents communicate on behalf of users, every message crosses a trust boundary where the user's personal data and the external agent's unconstrained language each present distinct risks. We address both through a dual-firewall architecture grounded in a unifying principle: each task defines a context, and both sides of the communication carry information far exceeding what that context requires. Our firewalls act as projections onto the task context, allowing only contextually appropriate content to cross each boundary. The Language Converter Firewall projects incoming messages onto a closed, domain-specific, structured protocol; an external agent's message is converted to validated fields while persuasive framing, urgency tactics, and embedded instructions are structurally eliminated through deterministic verification. This replaces the asymmetric challenge of resisting every possible manipulation with the structural guarantee that manipulation has no channel through which to arrive. The Data Abstraction Firewall projects outgoing information onto the granularity appropriate for the task, rather than applying binary disclose-or-redact filtering, as previous airgapping solutions did. Both firewalls operate in a trusted environment isolated from external input, applying domain-specific rules learned automatically from demonstrations. Across 864 attacks spanning three domains on the ConVerse benchmark, our architecture reduces privacy attack success rates (e.g., from 84% to 10% for GPT-5) and security attacks (from 60% to 3%), while maintaining or even improving task completion quality.

2.8CVOct 2, 2023

It's all about you: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera

Amr Gomaa, Guillermo Reyes, Michael Feld

Despite significant advances in gesture recognition technology, recognizing gestures in a driving environment remains challenging due to limited and costly data and its dynamic, ever-changing nature. In this work, we propose a model-adaptation approach to personalize the training of a CNNLSTM model and improve recognition accuracy while reducing data requirements. Our approach contributes to the field of dynamic hand gesture recognition while driving by providing a more efficient and accurate method that can be customized for individual users, ultimately enhancing the safety and convenience of in-vehicle interactions, as well as driver's experience and system trust. We incorporate hardware enhancement using a time-of-flight camera and algorithmic enhancement through data augmentation, personalized adaptation, and incremental learning techniques. We evaluate the performance of our approach in terms of recognition accuracy, achieving up to 90\%, and show the effectiveness of personalized adaptation and incremental learning for a user-centered design.

3.7CVSep 6, 2024Code

Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment

Keyne Oei, Amr Gomaa, Anna Maria Feit et al.

Robust frame-wise embeddings are essential to perform video analysis and understanding tasks. We present a self-supervised method for representation learning based on aligning temporal video sequences. Our framework uses a transformer-based encoder to extract frame-level features and leverages them to find the optimal alignment path between video sequences. We introduce the novel Local-Alignment Contrastive (LAC) loss, which combines a differentiable local alignment loss to capture local temporal dependencies with a contrastive loss to enhance discriminative learning. Prior works on video alignment have focused on using global temporal ordering across sequence pairs, whereas our loss encourages identifying the best-scoring subsequence alignment. LAC uses the differentiable Smith-Waterman (SW) affine method, which features a flexible parameterization learned through the training phase, enabling the model to adjust the temporal gap penalty length dynamically. Evaluations show that our learned representations outperform existing state-of-the-art approaches on action recognition tasks.

2.1AISep 11, 2023

Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

Amr Gomaa, Michael Feld

Recent advances in machine learning, particularly deep learning, have enabled autonomous systems to perceive and comprehend objects and their environments in a perceptual subsymbolic manner. These systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is a growing need to enhance these systems to understand objects and their environments more conceptually and symbolically. It is essential to consider both the explicit teaching provided by humans (e.g., describing a situation or explaining how to act) and the implicit teaching obtained by observing human behavior (e.g., through the system's sensors) to achieve this level of powerful artificial intelligence. Thus, the system must be designed with multimodal input and output capabilities to support implicit and explicit interaction models. In this position paper, we argue for considering both types of inputs, as well as human-in-the-loop and incremental learning techniques, for advancing the field of artificial intelligence and enabling autonomous systems to learn like humans. We propose several hypotheses and design guidelines and highlight a use case from related work to achieve this goal.

5.1HCOct 20, 2022

In-Vehicle Interface Adaptation to Environment-Induced Cognitive Workload

Elena Meiser, Alexandra Alles, Samuel Selter et al.

Many car accidents are caused by human distractions, including cognitive distractions. In-vehicle human-machine interfaces (HMIs) have evolved throughout the years, providing more and more functions. Interaction with the HMIs can, however, also lead to further distractions and, as a consequence, accidents. To tackle this problem, we propose using adaptive HMIs that change according to the mental workload of the driver. In this work, we present the current status as well as preliminary results of a user study using naturalistic secondary tasks while driving (i.e., the primary task) that attempt to understand the effects of one such interface.

2.7HCOct 22, 2024Code

AdaptoML-UX: An Adaptive User-centered GUI-based AutoML Toolkit for Non-AI Experts and HCI Researchers

Amr Gomaa, Michael Sargious, Antonio Krüger

The increasing integration of machine learning across various domains has underscored the necessity for accessible systems that non-experts can utilize effectively. To address this need, the field of automated machine learning (AutoML) has developed tools to simplify the construction and optimization of ML pipelines. However, existing AutoML solutions often lack efficiency in creating online pipelines and ease of use for Human-Computer Interaction (HCI) applications. Therefore, in this paper, we introduce AdaptoML-UX, an adaptive framework that incorporates automated feature engineering, machine learning, and incremental learning to assist non-AI experts in developing robust, user-centered ML models. Our toolkit demonstrates the capability to adapt efficiently to diverse problem domains and datasets, particularly in HCI, thereby reducing the necessity for manual experimentation and conserving time and resources. Furthermore, it supports model personalization through incremental learning, customizing models to individual user behaviors. HCI researchers can employ AdaptoML-UX (\url{https://github.com/MichaelSargious/AdaptoML_UX}) without requiring specialized expertise, as it automates the selection of algorithms, feature engineering, and hyperparameter tuning based on the unique characteristics of the data.

2.7HCJan 29, 2024Code

Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers

Amr Gomaa, Guillermo Reyes, Michael Feld et al.

The rapid advancement of the automotive industry towards automated and semi-automated vehicles has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle. Consequently, research has shifted toward gestural input (e.g., hand, gaze, and head pose gestures) as a more suitable mode of interaction during driving. However, due to the dynamic nature of driving and individual variation, there are significant differences in drivers' gestural input performance. While, in theory, this inherent variability could be moderated by substantial data-driven machine learning models, prevalent methodologies lean towards constrained, single-instance trained models for object referencing. These models show a limited capacity to continuously adapt to the divergent behaviors of individual drivers and the variety of driving scenarios. To address this, we propose \textit{IcRegress}, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects. We suggest a more personalized and adaptable solution for multimodal gestural interfaces, employing continuous lifelong learning to enhance driver experience, safety, and convenience. Our approach was evaluated using an outside-the-vehicle object referencing use case, highlighting the superiority of the incremental learning models adapted over a single trained model across various driver traits such as handedness, driving experience, and numerous driving conditions. Finally, to facilitate reproducibility, ease deployment, and promote further research, we offer our approach as an open-source framework at \url{https://github.com/amrgomaaelhady/IcRegress}.

13.4HCNov 3, 2021Code

ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle

Amr Gomaa, Guillermo Reyes, Michael Feld

Over the past decades, the addition of hundreds of sensors to modern vehicles has led to an exponential increase in their capabilities. This allows for novel approaches to interaction with the vehicle that go beyond traditional touch-based and voice command approaches, such as emotion recognition, head rotation, eye gaze, and pointing gestures. Although gaze and pointing gestures have been used before for referencing objects inside and outside vehicles, the multimodal interaction and fusion of these gestures have so far not been extensively studied. We propose a novel learning-based multimodal fusion approach for referencing outside-the-vehicle objects while maintaining a long driving route in a simulated environment. The proposed multimodal approaches outperform single-modality approaches in multiple aspects and conditions. Moreover, we also demonstrate possible ways to exploit behavioral differences between users when completing the referencing task to realize an adaptable personalized system for each driver. We propose a personalization technique based on the transfer-of-learning concept for exceedingly small data sizes to enhance prediction and adapt to individualistic referencing behavior. Our code is publicly available at https://github.com/amr-gomaa/ML-PersRef.

4.6LGOct 28, 2024

Unveiling the Role of Expert Guidance: A Comparative Analysis of User-centered Imitation Learning and Traditional Reinforcement Learning

Amr Gomaa, Bilal Mahdy

Integration of human feedback plays a key role in improving the learning capabilities of intelligent systems. This comparative study delves into the performance, robustness, and limitations of imitation learning compared to traditional reinforcement learning methods within these systems. Recognizing the value of human-in-the-loop feedback, we investigate the influence of expert guidance and suboptimal demonstrations on the learning process. Through extensive experimentation and evaluations conducted in a pre-existing simulation environment using the Unity platform, we meticulously analyze the effectiveness and limitations of these learning approaches. The insights gained from this study contribute to the advancement of human-centered artificial intelligence by highlighting the benefits and challenges associated with the incorporation of human feedback into the learning process. Ultimately, this research promotes the development of models that can effectively address complex real-world problems.

3.2ROMay 15, 2025

Predicting Human Behavior in Autonomous Systems: A Collaborative Machine Teaching Approach for Reducing Transfer of Control Events

Julian Wolter, Amr Gomaa

As autonomous systems become integral to various industries, effective strategies for fault handling are essential to ensure reliability and efficiency. Transfer of Control (ToC), a traditional approach for interrupting automated processes during faults, is often triggered unnecessarily in non-critical situations. To address this, we propose a data-driven method that uses human interaction data to train AI models capable of preemptively identifying and addressing issues or assisting users in resolution. Using an interactive tool simulating an industrial vacuum cleaner, we collected data and developed an LSTM-based model to predict user behavior. Our findings reveal that even data from non-experts can effectively train models to reduce unnecessary ToC events, enhancing the system's robustness. This approach highlights the potential of AI to learn directly from human problem-solving behaviors, complementing sensor data to improve industrial automation and human-AI collaboration.

13.6HCSep 23, 2020Code

Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle

Amr Gomaa, Guillermo Reyes, Alexandra Alles et al.

Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing. Despite significant advances, existing outside-the-vehicle referencing methods consider these modalities separately. Moreover, existing multimodal referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. In this paper, we investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects (e.g. buildings) from the vehicle. We furthermore explore person-specific differences in this interaction by analyzing individuals' performance for pointing and gaze patterns, along with their effect on the driving task. Our statistical analysis shows significant differences in individual behaviour based on object's location (i.e. driver's right side vs. left side), object's surroundings, driving mode (i.e. autonomous vs. normal driving) as well as pointing and gaze duration, laying the foundation for a user-adaptive approach.