Akansel Cosgun

RO
h-index23
39papers
1,614citations
Novelty39%
AI Score44

39 Papers

CVApr 24, 2023Code
A Benchmark for Cycling Close Pass Detection from Video Streams

Mingjie Li, Ben Beck, Tharindu Rathnayake et al.

Cycling is a healthy and sustainable mode of transport. However, interactions with motor vehicles remain a key barrier to increased cycling participation. The ability to detect potentially dangerous interactions from on-bike sensing could provide important information to riders and policymakers. A key influence on rider comfort and safety is close passes, i.e., when a vehicle narrowly passes a cyclist. In this paper, we introduce a novel benchmark, called Cyc-CP, towards close pass (CP) event detection from video streams. The task is formulated into two problem categories: scene-level and instance-level. Scene-level detection ascertains the presence of a CP event within the provided video clip. Instance-level detection identifies the specific vehicle within the scene that precipitates a CP event. To address these challenges, we introduce four benchmark models, each underpinned by advanced deep-learning methodologies. For training and evaluating those models, we have developed a synthetic dataset alongside the acquisition of a real-world dataset. The benchmark evaluations reveal that the models achieve an accuracy of 88.13\% for scene-level detection and 84.60\% for instance-level detection on the real-world dataset. We envision this benchmark as a test-bed to accelerate CP detection and facilitate interaction between the fields of road safety, intelligent transportation systems and artificial intelligence. Both the benchmark datasets and detection models will be available at https://github.com/SustainableMobility/cyc-cp to facilitate experimental reproducibility and encourage more in-depth research in the field.

54.2ROMar 22
Geometrically Plausible Object Pose Refinement using Differentiable Simulation

Anil Zeybek, Rhys Newbury, Snehal Dikhale et al.

State-of-the-art object pose estimation methods are prone to generating geometrically infeasible pose hypotheses. This problem is prevalent in dexterous manipulation, where estimated poses often intersect with the robotic hand or are not lying on a support surface. We propose a multi-modal pose refinement approach that combines differentiable physics simulation, differentiable rendering and visuo-tactile sensing to optimize object poses for both spatial accuracy and physical consistency. Simulated experiments show that our approach reduces the intersection volume error between the object and robotic hand by 73\% when the initial estimate is accurate and by over 87\% under high initial uncertainty, significantly outperforming standard ICP-based baselines. Furthermore, the improvement in geometric plausibility is accompanied by a concurrent reduction in translation and orientation errors. Achieving pose estimation that is grounded in physical reality while remaining faithful to multi-modal sensor inputs is a critical step toward robust in-hand manipulation.

ROOct 29, 2021Code
ARviz -- An Augmented Reality-enabled Visualization Platform for ROS Applications

Khoa C. Hoang, Wesley P. Chan, Steven Lay et al.

Current robot interfaces such as teach pendants and 2D screen displays used for task visualization and interaction often seem unintuitive and limited in terms of information flow. This compromises task efficiency as interacting with the interface can distract the user from the task at hand. Augmented Reality (AR) technology offers the capability to create visually rich displays and intuitive interaction elements in situ. In recent years, AR has shown promising potential to enable effective human-robot interaction. We introduce ARviz - a versatile, extendable AR visualization platform built for robot applications developed with the widely used Robot Operating System (ROS) framework. ARviz aims to provide both a universal visualization platform with the capability of displaying any ROS message data type in AR, as well as a multimodal user interface for interacting with robots over ROS. ARviz is built as a platform incorporating a collection of plugins that provide visualization and/or interaction components. Users can also extend the platform by implementing new plugins to suit their needs. We present three use cases as well as two potential use cases to showcase the capabilities and benefits of the ARviz platform for human-robot interaction applications. The open access source code for our ARviz platform is available at: https://github.com/hri-group/arviz.

CVSep 9, 2025
Australian Supermarket Object Set (ASOS): A Benchmark Dataset of Physical Objects and 3D Models for Robotics and Computer Vision

Akansel Cosgun, Lachlan Chumbley, Benjamin J. Meyer

This paper introduces the Australian Supermarket Object Set (ASOS), a comprehensive dataset comprising 50 readily available supermarket items with high-quality 3D textured meshes designed for benchmarking in robotics and computer vision applications. Unlike existing datasets that rely on synthetic models or specialized objects with limited accessibility, ASOS provides a cost-effective collection of common household items that can be sourced from a major Australian supermarket chain. The dataset spans 10 distinct categories with diverse shapes, sizes, and weights. 3D meshes are acquired by a structure-from-motion techniques with high-resolution imaging to generate watertight meshes. The dataset's emphasis on accessibility and real-world applicability makes it valuable for benchmarking object detection, pose estimation, and robotics applications.

ROFeb 25, 2022
Visibility Maximization Controller for Robotic Manipulation

Kerry He, Rhys Newbury, Tin Tran et al.

Occlusions caused by a robot's own body is a common problem for closed-loop control methods employed in eye-to-hand camera setups. We propose an optimization-based reactive controller that minimizes self-occlusions while achieving a desired goal pose. The approach allows coordinated control between the robot's base, arm and head by encoding the line-of-sight visibility to the target as a soft constraint along with other task-related constraints, and solving for feasible joint and base velocities. The generalizability of the approach is demonstrated in simulated and real-world experiments, on robots with fixed or mobile bases, with moving or fixed objects, and multiple objects. The experiments revealed a trade-off between occlusion rates and other task metrics. While a planning-based baseline achieved lower occlusion rates than the proposed controller, it came at the expense of highly inefficient paths and a significant drop in the task success. On the other hand, the proposed controller is shown to improve visibility to the line target object(s) without sacrificing too much from the task success and efficiency. Videos and code can be found at: rhys-newbury.github.io/projects/vmc/.

ROFeb 2, 2022
Metrics for Evaluating Social Conformity of Crowd Navigation Algorithms

Junxian Wang, Wesley P. Chan, Pamela Carreno-Medrano et al.

Recent protocols and metrics for training and evaluating autonomous robot navigation through crowds are inconsistent due to diversified definitions of "social behavior". This makes it difficult, if not impossible, to effectively compare published navigation algorithms. Furthermore, with the lack of a good evaluation protocol, resulting algorithms may fail to generalize, due to lack of diversity in training. To address these gaps, this paper facilitates a more comprehensive evaluation and objective comparison of crowd navigation algorithms by proposing a consistent set of metrics that accounts for both efficiency and social conformity, and a systematic protocol comprising multiple crowd navigation scenarios of varying complexity for evaluation. We tested four state-of-the-art algorithms under this protocol. Results revealed that some state-of-the-art algorithms have much challenge in generalizing, and using our protocol for training, we were able to improve the algorithm's performance. We demonstrate that the set of proposed metrics provides more insight and effectively differentiates the performance of these algorithms with respect to efficiency and social conformity.

RONov 4, 2021
Speed Maps: An Application to Guide Robots in Human Environments

Akansel Cosgun

We present the concept of speed maps: speed limits for mobile robots in human environments. Static speed maps allow for faster navigation on corridors while limiting the speed around corners and in rooms. Dynamic speed maps put limits on speed around humans. We demonstrate the concept for a mobile robot that guides people to annotated landmarks on the map. The robot keeps a metric map for navigation and a semantic map to hold planar surfaces for tasking. The system supports automatic initialization upon the detection of a specially designed QR code. We show that speed maps not only can reduce the impact of a potential collision but can also reduce navigation time.

AIJul 14, 2021
Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning

Dylan Klein, Akansel Cosgun

We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.

ROApr 15, 2021
Tabletop Object Rearrangement: Team ACRV's Entry to OCRTOC

Zheyu Zhang, Rhys Newbury, Kerry He et al.

Open Cloud Robot Table Organization Challenge (OCRTOC) is one of the most comprehensive cloud-based robotic manipulation competitions. It focuses on rearranging tabletop objects using vision as its primary sensing modality. In this extended abstract, we present our entry to the OCRTOC2020 and the key challenges the team has experienced.

ROApr 12, 2021
Virtual Barriers in Augmented Reality for Safe and Effective Human-Robot Cooperation in Manufacturing

Khoa Cong Hoang, Wesley P. Chan, Steven Lay et al.

Safety is a fundamental requirement in any human-robot collaboration scenario. To ensure the safety of users for such scenarios, we propose a novel Virtual Barrier system facilitated by an augmented reality interface. Our system provides two kinds of Virtual Barriers to ensure safety: 1) a Virtual Person Barrier which encapsulates and follows the user to protect them from colliding with the robot, and 2) Virtual Obstacle Barriers which users can spawn to protect objects or regions that the robot should not enter. To enable effective human-robot collaboration, our system includes an intuitive robot programming interface utilizing speech commands and hand gestures, and features the capability of automatic path re-planning when potential collisions are detected as a result of a barrier intersecting the robot's planned path. We compared our novel system with a standard 2D display interface through a user study, where participants performed a task mimicking an industrial manufacturing procedure. Results show that our system increases the user's sense of safety and task efficiency, and makes the interaction more intuitive.

ROApr 8, 2021
Seeing Thru Walls: Visualizing Mobile Robots in Augmented Reality

Morris Gu, Akansel Cosgun, Wesley P. Chan et al.

We present an approach for visualizing mobile robots through an Augmented Reality headset when there is no line-of-sight visibility between the robot and the human. Three elements are visualized in Augmented Reality: 1) Robot's 3D model to indicate its position, 2) An arrow emanating from the robot to indicate its planned movement direction, and 3) A 2D grid to represent the ground plane. We conduct a user study with 18 participants, in which each participant are asked to retrieve objects, one at a time, from stations at the two sides of a T-junction at the end of a hallway where a mobile robot is roaming. The results show that visualizations improved the perceived safety and efficiency of the task and led to participants being more comfortable with the robot within their personal spaces. Furthermore, visualizing the motion intent in addition to the robot model was found to be more effective than visualizing the robot model alone. The proposed system can improve the safety of automated warehouses by increasing the visibility and predictability of robots.

ROApr 7, 2021
Demonstrating Cloth Folding to Robots: Design and Evaluation of a 2D and a 3D User Interface

Benjamin Waymouth, Akansel Cosgun, Rhys Newbury et al.

An appropriate user interface to collect human demonstration data for deformable object manipulation has been mostly overlooked in the literature. We present an interaction design for demonstrating cloth folding to robots. Users choose pick and place points on the cloth and can preview a visualization of a simulated cloth before real-robot execution. Two interfaces are proposed: A 2D display-and-mouse interface where points are placed by clicking on an image of the cloth, and a 3D Augmented Reality interface where the chosen points are placed by hand gestures. We conduct a user study with 18 participants, in which each user completed two sequential folds to achieve a cloth goal shape. Results show that while both interfaces were acceptable, the 3D interface was found to be more suitable for understanding the task, and the 2D interface suitable for repetition. Results also found that fold previews improve three key metrics: task efficiency, the ability to predict the final shape of the cloth and overall user satisfaction.

ROMar 6, 2021
Visualizing Robot Intent for Object Handovers with Augmented Reality

Rhys Newbury, Akansel Cosgun, Tysha Crowley-Davis et al.

Humans are highly skilled in communicating their intent for when and where a handover would occur. However, even the state-of-the-art robotic implementations for handovers typically lack of such communication skills. This study investigates visualization of the robot's internal state and intent for Human-to-Robot Handovers using Augmented Reality. Specifically, we explore the use of visualized 3D models of the object and the robotic gripper to communicate the robot's estimation of where the object is and the pose in which the robot intends to grasp the object. We tested this design via a user study with 16 participants, in which each participant handed over a cube-shaped object to the robot 12 times. Results show communicating robot intent via augmented reality substantially improves the perceived experience of the users for handovers. Results also indicate that the effectiveness of augmented reality is even more pronounced for the perceived safety and fluency of the interaction when the robot makes errors in localizing the object.

ROMar 6, 2021
Passing Through Narrow Gaps with Deep Reinforcement Learning

Brendan Tidd, Akansel Cosgun, Jurgen Leitner et al.

The U.S. Defense Advanced Research Projects Agency (DARPA) Subterranean Challenge requires teams of robots to traverse difficult and diverse underground environments. Traversing small gaps is one of the challenging scenarios that robots encounter. Imperfect sensor information makes it difficult for classical navigation methods, where behaviours require significant manual fine tuning. In this paper we present a deep reinforcement learning method for autonomously navigating through small gaps, where contact between the robot and the gap may be required. We first learn a gap behaviour policy to get through small gaps (only centimeters wider than the robot). We then learn a goal-conditioned behaviour selection policy that determines when to activate the gap behaviour policy. We train our policies in simulation and demonstrate their effectiveness with a large tracked robot in simulation and on the real platform. In simulation experiments, our approach achieves 93\% success rate when the gap behaviour is activated manually by an operator, and 63\% with autonomous activation using the behaviour selection policy. In real robot experiments, our approach achieves a success rate of 73\% with manual activation, and 40\% with autonomous behaviour selection. While we show the feasibility of our approach in simulation, the difference in performance between simulated and real world scenarios highlight the difficulty of direct sim-to-real transfer for deep reinforcement learning policies. In both the simulated and real world environments alternative methods were unable to traverse the gap.

ROJan 23, 2021
Learning Setup Policies: Reliable Transition Between Locomotion Behaviours

Brendan Tidd, Nicolas Hudson, Akansel Cosgun et al.

Dynamic platforms that operate over many unique terrain conditions typically require many behaviours. To transition safely, there must be an overlap of states between adjacent controllers. We develop a novel method for training setup policies that bridge the trajectories between pre-trained Deep Reinforcement Learning (DRL) policies. We demonstrate our method with a simulated biped traversing a difficult jump terrain, where a single policy fails to learn the task, and switching between pre-trained policies without setup policies also fails. We perform an ablation of key components of our system, and show that our method outperforms others that learn transition policies. We demonstrate our method with several difficult and diverse terrain types, and show that we can use setup policies as part of a modular control suite to successfully traverse a sequence of complex terrains. We show that using setup policies improves the success rate for traversing a single difficult jump terrain (from 51.3% success rate with the best comparative method to 82.2%), and traversing a random sequence of difficult obstacles (from 1.9% without setup policies to 71.2%).

ROJan 2, 2021
Semantics for Robotic Mapping, Perception and Interaction: A Survey

Sourav Garg, Niko Sünderhauf, Feras Dayoub et al.

For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where...

RONov 1, 2020
Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts

Brendan Tidd, Nicolas Hudson, Akansel Cosgun et al.

Legged robots often use separate control policiesthat are highly engineered for traversing difficult terrain suchas stairs, gaps, and steps, where switching between policies isonly possible when the robot is in a region that is commonto adjacent controllers. Deep Reinforcement Learning (DRL)is a promising alternative to hand-crafted control design,though typically requires the full set of test conditions to beknown before training. DRL policies can result in complex(often unrealistic) behaviours that have few or no overlappingregions between adjacent policies, making it difficult to switchbehaviours. In this work we develop multiple DRL policieswith Curriculum Learning (CL), each that can traverse asingle respective terrain condition, while ensuring an overlapbetween policies. We then train a network for each destinationpolicy that estimates the likelihood of successfully switchingfrom any other policy. We evaluate our switching methodon a previously unseen combination of terrain artifacts andshow that it performs better than heuristic methods. Whileour method is trained on individual terrain types, it performscomparably to a Deep Q Network trained on the full set ofterrain conditions. This approach allows the development ofseparate policies in constrained conditions with embedded priorknowledge about each behaviour, that is scalable to any numberof behaviours, and prepares DRL methods for applications inthe real world

ROOct 28, 2020
Joint Path and Push Planning Among Movable Obstacles

Victor Emeli, Akansel Cosgun

This paper explores the Navigation Among Movable Obstacles (NAMO) problem and proposes joint path and push planning: which path to take and in what direction the obstacles should be pushed at, given a start and goal position. We present a planning algorithm for selecting a path and the obstacles to be pushed, where a Rapidly-exploring Random Tree (RRT)-based heuristic is employed to calculate a minimal collision path. When it is necessary to apply a pushing force to slide an obstacle out of the way, the planners leverage means-end analysis through a dynamic physics simulation to determine the sequence of linear pushes to clear the necessary space. Simulation experiments show that our approach finds solutions in higher clutter percentages (up to 49%) compared to the straight-line push planner (37%) and RRT without pushing (18%).

ROOct 8, 2020
Guided Curriculum Learning for Walking Over Complex Terrain

Brendan Tidd, Nicolas Hudson, Akansel Cosgun

Reliable bipedal walking over complex terrain is a challenging problem, using a curriculum can help learning. Curriculum learning is the idea of starting with an achievable version of a task and increasing the difficulty as a success criteria is met. We propose a 3-stage curriculum to train Deep Reinforcement Learning policies for bipedal walking over various challenging terrains. In the first stage, the agent starts on an easy terrain and the terrain difficulty is gradually increased, while forces derived from a target policy are applied to the robot joints and the base. In the second stage, the guiding forces are gradually reduced to zero. Finally, in the third stage, random perturbations with increasing magnitude are applied to the robot base, so the robustness of the policies are improved. In simulation experiments, we show that our approach is effective in learning walking policies, separate from each other, for five terrain types: flat, hurdles, gaps, stairs, and steps. Moreover, we demonstrate that in the absence of human demonstrations, a simple hand designed walking trajectory is a sufficient prior to learn to traverse complex terrain types. In ablation studies, we show that taking out any one of the three stages of the curriculum degrades the learning performance.

ROOct 7, 2020
Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

Robert Lee, Daniel Ward, Akansel Cosgun et al.

Manipulating deformable objects, such as fabric, is a long standing problem in robotics, with state estimation and control posing a significant challenge for traditional methods. In this paper, we show that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation. Our approach relies on fully convolutional networks and the manipulation of visual inputs to exploit learned features, allowing us to create an expressive goal-conditioned pick and place policy that can be trained efficiently with real world robot data only. Folding skills are learned with only a sparse reward function and thus do not require reward function engineering, merely an image of the goal configuration. We demonstrate our method on a set of towel-folding tasks, and show that our approach is able to discover sequential folding strategies, purely from trial-and-error. We achieve state-of-the-art results without the need for demonstrations or simulation, used in prior approaches. Videos available at: https://sites.google.com/view/learningtofold

CVAug 24, 2020
Strawberry Detection using Mixed Training on Simulated and Real Data

Sunny Goondram, Akansel Cosgun, Dana Kulic

This paper demonstrates how simulated images can be useful for object detection tasks in the agricultural sector, where labeled data can be scarce and costly to collect. We consider training on mixed datasets with real and simulated data for strawberry detection in real images. Our results show that using the real dataset augmented by the simulated dataset resulted in slightly higher accuracy.

ROJul 25, 2020
Object Handovers: a Review for Robotics

Valerio Ortenzi, Akansel Cosgun, Tommaso Pardi et al.

This article surveys the literature on human-robot object handovers. A handover is a collaborative joint action where an agent, the giver, gives an object to another agent, the receiver. The physical exchange starts when the receiver first contacts the object held by the giver and ends when the giver fully releases the object to the receiver. However, important cognitive and physical processes begin before the physical exchange, including initiating implicit agreement with respect to the location and timing of the exchange. From this perspective, we structure our review into the two main phases delimited by the aforementioned events: 1) a pre-handover phase, and 2) the physical exchange. We focus our analysis on the two actors (giver and receiver) and report the state of the art of robotic givers (robot-to-human handovers) and the robotic receivers (human-to-robot handovers). We report a comprehensive list of qualitative and quantitative metrics commonly used to assess the interaction. While focusing our review on the cognitive level (e.g., prediction, perception, motion planning, learning) and the physical level (e.g., motion, grasping, grip release) of the handover, we briefly discuss also the concepts of safety, social context, and ergonomics. We compare the behaviours displayed during human-to-human handovers to the state of the art of robotic assistants, and identify the major areas of improvement for robotic assistants to reach performance comparable to human interactions. Finally, we propose a minimal set of metrics that should be used in order to enable a fair comparison among the approaches.

CVJul 20, 2020
Gesture Recognition for Initiating Human-to-Robot Handovers

Jun Kwan, Chinkye Tan, Akansel Cosgun

Human-to-Robot handovers are useful for many Human-Robot Interaction scenarios. It is important to recognize when a human intends to initiate handovers, so that the robot does not try to take objects from humans when a handover is not intended. We pose the handover gesture recognition as a binary classification problem in a single RGB image. Three separate neural network modules for detecting the object, human body key points and head orientation, are implemented to extract relevant features from the RGB images, and then the feature vectors are passed into a deep neural net to perform binary classification. Our results show that the handover gestures are correctly identified with an accuracy of over 90%. The abstraction of the features makes our approach modular and generalizable to different objects and human body types.

ROJun 2, 2020
Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision

Patrick Rosenberger, Akansel Cosgun, Rhys Newbury et al.

We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.

ROMay 2, 2020
Supportive Actions for Manipulation in Human-Robot Coworker Teams

Shray Bansal, Rhys Newbury, Wesley Chan et al.

The increasing presence of robots alongside humans, such as in human-robot teams in manufacturing, gives rise to research questions about the kind of behaviors people prefer in their robot counterparts. We term actions that support interaction by reducing future interference with others as supportive robot actions and investigate their utility in a co-located manipulation scenario. We compare two robot modes in a shared table pick-and-place task: (1) Task-oriented: the robot only takes actions to further its own task objective and (2) Supportive: the robot sometimes prefers supportive actions to task-oriented ones when they reduce future goal-conflicts. Our experiments in simulation, using a simplified human model, reveal that supportive actions reduce the interference between agents, especially in more difficult tasks, but also cause the robot to take longer to complete the task. We implemented these modes on a physical robot in a user study where a human and a robot perform object placement on a shared table. Our results show that a supportive robot was perceived as a more favorable coworker by the human and also reduced interference with the human in the more difficult of two scenarios. However, it also took longer to complete the task highlighting an interesting trade-off between task-efficiency and human-preference that needs to be considered before designing robot behavior for close-proximity manipulation scenarios.

ROApr 1, 2020
Learning to Place Objects onto Flat Surfaces in Upright Orientations

Rhys Newbury, Kerry He, Akansel Cosgun et al.

We study the problem of placing a grasped object on an empty flat surface in an upright orientation, such as placing a cup on its bottom rather than on its side. We aim to find the required object rotation such that when the gripper is opened after the object makes contact with the surface, the object would be stably placed in the upright orientation. We iteratively use two neural networks. At every iteration, we use a convolutional neural network to estimate the required object rotation, which is executed by the robot, and then a separate convolutional neural network to estimate the quality of a placement in its current orientation. Our approach places previously unseen objects in upright orientations with a success rate of 98.1% in free space and 90.3% with a simulated robotic arm, using a dataset of 50 everyday objects in simulation experiments. Real-world experiments were performed, which achieved an 88.0% success rate, which serves as a proof-of-concept for direct sim-to-real transfer.

ROJan 30, 2020
Model-free vision-based shaping of deformable plastic materials

Andrea Cherubini, Valerio Ortenzi, Akansel Cosgun et al.

We address the problem of shaping deformable plastic materials using non-prehensile actions. Shaping plastic objects is challenging, since they are difficult to model and to track visually. We study this problem, by using kinetic sand, a plastic toy material which mimics the physical properties of wet sand. Inspired by a pilot study where humans shape kinetic sand, we define two types of actions: \textit{pushing} the material from the sides and \textit{tapping} from above. The chosen actions are executed with a robotic arm using image-based visual servoing. From the current and desired view of the material, we define states based on visual features such as the outer contour shape and the pixel luminosity values. These are mapped to actions, which are repeated iteratively to reduce the image error until convergence is reached. For pushing, we propose three methods for mapping the visual state to an action. These include heuristic methods and a neural network, trained from human actions. We show that it is possible to obtain simple shapes with the kinetic sand, without explicitly modeling the material. Our approach is limited in the types of shapes it can achieve. A richer set of action types and multi-step reasoning is needed to achieve more sophisticated shapes.

ROJun 17, 2019
Embracing Contact: Pushing Multiple Objects with Robot's Forearm

Akansel Cosgun, Luke Ditria, Shayne D'Lima et al.

Grasping is the dominant approach for robot manipulation, but only a single object can be grasped at a time. Nonprehensile manipulation offers richer set of interactions, however state-of-the-art is limited to using the end-effector only. We propose using a robot link (forearm) to push multiple objects at once. In a simulated task where the robot's task is to sort two kinds of objects into their respective goal regions, we show that a greedy strategy that uses a combination of forearm pushes and pick and place operations reduces task completion time by %28 compared to picking and placing each object individually.

ROMay 22, 2019
Practical Robot Learning from Demonstrations using Deep End-to-End Training

Akansel Cosgun, Thomas Rowntree, Ian Reid et al.

Robots need to learn behaviors in intuitive and practical ways for widespread deployment in human environments. To learn a robot behavior end-to-end, we train a variant of the ResNet that maps eye-in-hand camera images to end-effector velocities. In our setup, a human teacher demonstrates the task via joystick. We show that a simple servoing task can be learned in less than an hour including data collection, model training and deployment time. Moreover, 16 minutes of demonstrations were enough for the robot to learn the task.

ROApr 11, 2019
Learning to Take Good Pictures of People with a Robot Photographer

Rhys Newbury, Akansel Cosgun, Mehmet Koseoglu et al.

We present a robotic system capable of navigating autonomously by following a line and taking good quality pictures of people. When a group of people are detected, the robot rotates towards them and then back to line while continuously taking pictures from different angles. Each picture is processed in the cloud where its quality is estimated in a two-stage algorithm. First, features such as the face orientation and likelihood of facial emotions are input to a fully connected neural network to assign a quality score to each face. Second, a representation is extracted by abstracting faces from the image and it is input to a to Convolutional Neural Network (CNN) to classify the quality of the overall picture. We collected a dataset in which a picture was labeled as good quality if subjects are well-positioned in the image and oriented towards the camera with a pleasant expression. Our approach detected the quality of pictures with 78.4% accuracy in this dataset and received a better mean user rating (3.71/5) than a heuristic method that uses photographic composition procedures in a study where 97 human judges rated each picture. A statistical analysis against the state-of-the-art verified the quality of the resulting pictures.

AIAug 7, 2018
Collaborative Planning for Mixed-Autonomy Lane Merging

Shray Bansal, Akansel Cosgun, Alireza Nakhaei et al.

Driving is a social activity: drivers often indicate their intent to change lanes via motion cues. We consider mixed-autonomy traffic where a Human-driven Vehicle (HV) and an Autonomous Vehicle (AV) drive together. We propose a planning framework where the degree to which the AV considers the other agent's reward is controlled by a selfishness factor. We test our approach on a simulated two-lane highway where the AV and HV merge into each other's lanes. In a user study with 21 subjects and 6 different selfishness factors, we found that our planning approach was sound and that both agents had less merging times when a factor that balances the rewards for the two agents was chosen. Our results on double lane merging suggest it to be a non-zero-sum game and encourage further investigation on collaborative decision making algorithms for mixed-autonomy traffic.

ROJun 1, 2018
Modeling Preemptive Behaviors for Uncommon Hazardous Situations From Demonstrations

Priyam Parashar, Akansel Cosgun, Alireza Nakhaei et al.

This paper presents a learning from demonstration approach to programming safe, autonomous behaviors for uncommon driving scenarios. Simulation is used to re-create a targeted driving situation, one containing a road-side hazard creating a significant occlusion in an urban neighborhood, and collect optimal driving behaviors from 24 users. Paper employs a key-frame based approach combined with an algorithm to linearly combine models in order to extend the behavior to novel variations of the target situation. This approach is theoretically agnostic to the kind of LfD framework used for modeling data and our results suggest it generalizes well to variations containing an additional number of hazards occurring in sequence. The linear combination algorithm is informed by analysis of driving data, which also suggests that decision-making algorithms need to consider a trade-off between road-rules and immediate rewards to tackle some complex cases.

AIFeb 28, 2018
Selective Experience Replay for Lifelong Learning

David Isele, Akansel Cosgun

Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.

LGNov 30, 2017
Transferring Autonomous Driving Knowledge on Simulated and Real Intersections

David Isele, Akansel Cosgun

We view intersection handling on autonomous vehicles as a reinforcement learning problem, and study its behavior in a transfer learning setting. We show that a network trained on one type of intersection generally is not able to generalize to other intersections. However, a network that is pre-trained on one intersection and fine-tuned on another performs better on the new task compared to training in isolation. This network also retains knowledge of the prior task, even though some forgetting occurs. Finally, we show that the benefits of fine-tuning hold when transferring simulated intersection handling knowledge to a real autonomous vehicle.

ROOct 24, 2017
Context Aware Robot Navigation using Interactively Built Semantic Maps

Akansel Cosgun, Henrik Christensen

We discuss the process of building semantic maps, how to interactively label entities in them, and how to use them to enable context-aware navigation behaviors in human environments. We utilize planar surfaces, such as walls and tables, and static objects, such as door signs, as features for our semantic mapping approach. Users can interactively annotate these features by having the robot follow him/her, entering the label through a mobile app, and performing a pointing gesture toward the landmark of interest. Our gesture based approach can reliably estimate which object is being pointed at and detect ambiguous gestures with probabilistic modeling. Our person following method attempts to maximize future utility by a search for future actions assuming constant velocity model for the human. We describe a method to extract metric goals from a semantic map landmark and to plan a human aware path that takes into account the personal spaces of people. Finally, we demonstrate context-awareness for person following in two scenarios: interactive labeling and door passing. We believe that future navigation approaches and service robotics applications can be made more effective by further exploiting the structure of human environments.

LGMay 2, 2017
Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections

David Isele, Akansel Cosgun, Kikuo Fujimura

We analyze how the knowledge to autonomously handle one type of intersection, represented as a Deep Q-Network, translates to other types of intersections (tasks). We view intersection handling as a deep reinforcement learning problem, which approximates the state action Q function as a deep neural network. Using a traffic simulator, we show that directly copying a network trained for one type of intersection to another type of intersection decreases the success rate. We also show that when a network that is pre-trained on Task A and then is fine-tuned on a Task B, the resulting network not only performs better on the Task B than an network exclusively trained on Task A, but also retained knowledge on the Task A. Finally, we examine a lifelong learning setting, where we train a single network on five different types of intersections sequentially and show that the resulting network exhibited catastrophic forgetting of knowledge on previous tasks. This result suggests a need for a long-term memory component to preserve knowledge.

AIMay 2, 2017
Navigating Occluded Intersections with Autonomous Vehicles using Deep Reinforcement Learning

David Isele, Reza Rahimi, Akansel Cosgun et al.

Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system's ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.

ROMay 2, 2017
Towards Full Automated Drive in Urban Environments: A Demonstration in GoMentum Station, California

Akansel Cosgun, Lichao Ma, Jimmy Chiu et al.

Each year, millions of motor vehicle traffic accidents all over the world cause a large number of fatalities, injuries and significant material loss. Automated Driving (AD) has potential to drastically reduce such accidents. In this work, we focus on the technical challenges that arise from AD in urban environments. We present the overall architecture of an AD system and describe in detail the perception and planning modules. The AD system, built on a modified Acura RLX, was demonstrated in a course in GoMentum Station in California. We demonstrated autonomous handling of 4 scenarios: traffic lights, cross-traffic at intersections, construction zones and pedestrians. The AD vehicle displayed safe behavior and performed consistently in repeated demonstrations with slight variations in conditions. Overall, we completed 44 runs, encompassing 110km of automated driving with only 3 cases where the driver intervened the control of the vehicle, mostly due to error in GPS positioning. Our demonstration showed that robust and consistent behavior in urban scenarios is possible, yet more investigation is necessary for full scale roll-out on public roads.

ROApr 14, 2017
Belief State Planning for Autonomously Navigating Urban Intersections

Maxime Bouton, Akansel Cosgun, Mykel J. Kochenderfer

Urban intersections represent a complex environment for autonomous vehicles with many sources of uncertainty. The vehicle must plan in a stochastic environment with potentially rapid changes in driver behavior. Providing an efficient strategy to navigate through urban intersections is a difficult task. This paper frames the problem of navigating unsignalized intersections as a partially observable Markov decision process (POMDP) and solves it using a Monte Carlo sampling method. Empirical results in simulation show that the resulting policy outperforms a threshold-based heuristic strategy on several relevant metrics that measure both safety and efficiency.