ROAug 28, 2023
Human Comfortability Index Estimation in Industrial Human-Robot Collaboration TaskCelal Savur, Jamison Heard, Ferat Sahin
Fluent human-robot collaboration requires a robot teammate to understand, learn, and adapt to the human's psycho-physiological state. Such collaborations require a computing system that monitors human physiological signals during human-robot collaboration (HRC) to quantitatively estimate a human's level of comfort, which we have termed in this research as comfortability index (CI) and uncomfortability index (unCI). Subjective metrics (surprise, anxiety, boredom, calmness, and comfortability) and physiological signals were collected during a human-robot collaboration experiment that varied robot behavior. The emotion circumplex model is adapted to calculate the CI from the participant's quantitative data as well as physiological data. To estimate CI/unCI from physiological signals, time features were extracted from electrocardiogram (ECG), galvanic skin response (GSR), and pupillometry signals. In this research, we successfully adapt the circumplex model to find the location (axis) of 'comfortability' and 'uncomfortability' on the circumplex model, and its location match with the closest emotions on the circumplex model. Finally, the study showed that the proposed approach can estimate human comfortability/uncomfortability from physiological signals.
11.8LGMay 12
Intrinsic Vicarious Conditioning for Deep Reinforcement LearningRodney A Sanchez, Ferat Sahin, Alex Ororbia et al.
Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-policy or learn-by-example methods can learn from demonstrators' representations, but they require access to the demonstrating agent's policies or their reward functions. Our work overcomes this direct sampling limitation by introducing vicarious conditioning as an intrinsic reward mechanism. We draw from psychological and biological literature to provide a foundation for vicarious conditioning and use memory-based methods to implement its four steps: attention, retention, reproduction, and reinforcement. Crucially, our vicarious conditioning paradigms support low-shot learning and do not require the demonstrator agent's policy nor its reward functions. We evaluate our approach in the MiniWorld Sidewalk environment, one of the few public environments that features a non-descriptive terminal condition (no reward provided upon agent death), and extend it to Box2D's CarRacing environment. Our results across both environments demonstrate that vicarious conditioning enables longer episode lengths by discouraging the agent from non-descriptive terminal conditions and guiding the agent toward desirable states. Overall, this work emulates a cognitively-plausible learning paradigm better suited to problems such as single-life learning or continual learning.
ROMar 2, 2021Code
Learning Robotic Manipulation Tasks via Task Progress based Gaussian Reward and Loss Adjusted ExplorationSulabh Kumra, Shirin Josh, Ferat Sahin
Multi-step manipulation tasks in unstructured environments are extremely challenging for a robot to learn. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. We propose a model-free deep reinforcement learning method to learn multi-step manipulation tasks. We introduce a Robotic Manipulation Network (RoManNet), which is a vision-based model architecture, to learn the action-value functions and predict manipulation action candidates. We define a Task Progress based Gaussian (TPG) reward function that computes the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, we introduce a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. We demonstrate the effectiveness of our approach by training RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that our method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking. Code is available at: https://github.com/skumra/romannet
AIJun 5, 2025
Avoiding Death through Fear Intrinsic ConditioningRodney Sanchez, Ferat Sahin, Alexander Ororbia et al.
Biological and psychological concepts have inspired reinforcement learning algorithms to create new complex behaviors that expand agents' capacity. These behaviors can be seen in the rise of techniques like goal decomposition, curriculum, and intrinsic rewards, which have paved the way for these complex behaviors. One limitation in evaluating these methods is the requirement for engineered extrinsic for realistic environments. A central challenge in engineering the necessary reward function(s) comes from these environments containing states that carry high negative rewards, but provide no feedback to the agent. Death is one such stimuli that fails to provide direct feedback to the agent. In this work, we introduce an intrinsic reward function inspired by early amygdala development and produce this intrinsic reward through a novel memory-augmented neural network (MANN) architecture. We show how this intrinsic motivation serves to deter exploration of terminal states and results in avoidance behavior similar to fear conditioning observed in animals. Furthermore, we demonstrate how modifying a threshold where the fear response is active produces a range of behaviors that are described under the paradigm of general anxiety disorders (GADs). We demonstrate this behavior in the Miniworld Sidewalk environment, which provides a partially observable Markov decision process (POMDP) and a sparse reward with a non-descriptive terminal condition, i.e., death. In effect, this study results in a biologically-inspired neural architecture and framework for fear conditioning paradigms; we empirically demonstrate avoidance behavior in a constructed agent that is able to solve environments with non-descriptive terminal conditions.
ROFeb 23, 2022
Learning Multi-step Robotic Manipulation Policies from Visual Observation of Scene and Q-value Predictions of Previous ActionSulabh Kumra, Shirin Joshi, Ferat Sahin
In this work, we focus on multi-step manipulation tasks that involve long-horizon planning and considers progress reversal. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. We propose a sample efficient Previous Action Conditioned Robotic Manipulation Network (PAC-RoManNet) to learn the action-value functions and predict manipulation action candidates from visual observation of the scene and action-value predictions of the previous action. We define a Task Progress based Gaussian (TPG) reward function that computes the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, we introduce a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. We demonstrate the effectiveness of our approach by training PAC-RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that our method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking. Additional experiments on Ravens-10 benchmark tasks suggest good generalizability of the proposed PAC-RoManNet.
ROJul 9, 2020
Robotic Grasping using Deep Reinforcement LearningShirin Joshi, Sulabh Kumra, Ferat Sahin
In this work, we present a deep reinforcement learning based method to solve the problem of robotic grasping using visio-motor feedback. The use of a deep learning based approach reduces the complexity caused by the use of hand-designed features. Our method uses an off-policy reinforcement learning framework to learn the grasping policy. We use the double deep Q-learning framework along with a novel Grasp-Q-Network to output grasp probabilities used to learn grasps that maximize the pick success. We propose a visual servoing mechanism that uses a multi-view camera setup that observes the scene which contains the objects of interest. We performed experiments using a Baxter Gazebo simulated environment as well as on the actual robot. The results show that our proposed method outperforms the baseline Q-learning framework and increases grasping accuracy by adapting a multi-view model in comparison to a single-view model.
ROSep 21, 2019
Human Position Detection & Tracking with On-robot Time-of-Flight Laser Ranging SensorsSarthak Arora, Shitij Kumar, Ferat Sahin
In this paper, we propose a simple methodology to detect the partial pose of a human occupying the manipulator work-space using only on-robot time--of--flight laser ranging sensors. The sensors are affixed on each link of the robot in a circular array fashion where each array possesses sixteen single unit laser ranging lidar(s). The detection is performed by leveraging an artificial neural network which takes a highly sparse 3-D point cloud input to produce an estimate of the partial pose which is the ground projection frame of the human footprint. We also present a particle filter based approach to the tracking problem when the input data is unreliable. Ultimately, the simulation results are presented and analyzed.
ROSep 11, 2019
Antipodal Robotic Grasping using Generative Residual Convolutional Neural NetworkSulabh Kumra, Shirin Joshi, Ferat Sahin
In this paper, we present a modular robotic system to tackle the problem of generating and performing antipodal robotic grasps for unknown objects from n-channel image of the scene. We propose a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel input at real-time speeds (~20ms). We evaluate the proposed model architecture on standard datasets and a diverse set of household objects. We achieved state-of-the-art accuracy of 97.7% and 94.6% on Cornell and Jacquard grasping datasets respectively. We also demonstrate a grasp success rate of 95.4% and 93% on household and adversarial objects respectively using a 7 DoF robotic arm.
MAJul 25, 2019
A Framework for Monitoring Human Physiological Response during Human Robot Collaborative TaskCelal Savur, Shitij Kumar, Ferat Sahin
In this paper, a framework for monitoring human physiological response during Human-Robot Collaborative (HRC) task is presented. The framework highlights the importance of generation of event markers related to both human and robot, and also synchronization of data collected. This framework enables continuous data collection during an HRC task when changing robot movements as a form of stimuli to invoke a human physiological response. It also presents two case studies based on this framework and a data visualization tool for representation and easy analysis of the collected data during an HRC experiment.
ROJul 3, 2019
Sensing Volume Coverage of Robot Workspace using On-Robot Time-of-Flight Sensor Arrays for Safe Human Robot InteractionShitij Kumar, Ferat Sahin
In this paper, an analysis of the sensing volume coverage of robot workspace as well as the shared human-robot collaborative workspace for various configurations of on-robot Time-of-Flight (ToF) sensor array rings is presented. A methodology for volumetry using octrees to quantify the detection/sensing volume of the sensors is proposed. The change in sensing volume coverage by increasing the number of sensors per ToF sensor array ring and also increasing the number of rings mounted on robot link is also studied. Considerations of maximum ideal volume around the robot workspace that a given ToF sensor array ring placement and orientation setup should cover for safe human robot interaction are presented. The sensing volume coverage measurements in this maximum ideal volume are tabulated and observations on various ToF configurations and their coverage for close and far zones of the robot are determined.
ROMay 3, 2019
HRC-SoS: Human Robot Collaboration Experimentation Platform as System of SystemsCelal Savur, Shitij Kumar, Sarthak Arora et al.
This paper presents an experimentation platform for human robot collaboration as a system of systems as well as proposes a conceptual framework describing the aspects of Human Robot Collaboration. These aspects are Awareness, Intelligence and Compliance of the system. Based on this framework case studies describing experiment setups performed using this platform are discussed. Each experiment highlights the use of the subsystems such as the digital twin, motion capture system, human-physiological monitoring system, data collection system and robot control and interface systems. A highlight of this paper showcases a subsystem with the ability to monitor human physiological feedback during a human robot collaboration task.
ROMay 1, 2019
Cyber-Physical Testbed for Human-Robot Collaborative Task Planning and ExecutionTuly Hazbar, Shitij Kumar, Ferat Sahin
In this paper, we present a cyber-physical testbed created to enable a human-robot team to perform a shared task in a shared workspace. The testbed is suitable for the implementation of a tabletop manipulation task, a common human-robot collaboration scenario. The testbed integrates elements that exist in the physical and virtual world. In this work, we report the insights we gathered throughout our exploration in understanding and implementing task planning and execution for human-robot team.
ROApr 16, 2019
Speed and Separation Monitoring using on-robot Time--of--Flight laser--ranging sensor arraysShitij Kumar, Sarthak Arora, Ferat Sahin
In this paper, a speed and separation monitoring (SSM) based safety controller using three time-of-flight ranging sensor arrays fastened to the robot links, is implemented. Based on the human-robot minimum distance and their relative velocities, a controller output characterized by a modulating robot operation speed is obtained. To avert self-avoidance, a self occlusion detection method is implemented using ray-casting technique to filter out the distance values associated with the robot-self and the restricted robot workspace. For validation, the robot workspace is monitored using a motion capture setup to create a digital twin of the human and robot. This setup is used to compare the safety,performance and productivity of various versions of SSM safety configurations based on minimum distance between human and robot calculated using on-robot Time-of-Flight sensors, motion capture and a 2D scanning lidar.
ROSep 27, 2018
Collaborative Robot Learning from Demonstrations using Hidden Markov Model State DistributionSulabh Kumra, Ferat Sahin
In robotics, there is need of an interactive and expedite learning method as experience is expensive. Robot Learning from Demonstration (RLfD) enables a robot to learn a policy from demonstrations performed by teacher. RLfD enables a human user to add new capabilities to a robot in an intuitive manner, without explicitly reprogramming it. In this work, we present a novel interactive framework, where a collaborative robot learns skills for trajectory based tasks from demonstrations performed by a human teacher. The robot extracts features from each demonstration called as key-points and learns a model of the demonstrated skill using Hidden Markov Model (HMM). Our experimental results show that the learned model can be used to produce a generalized trajectory based skill.