Taşkın Padır

h-index23

22papers

287citations

Novelty48%

AI Score37

Ranked #92,169 of 194,257 authors (top 47%)#2,759 in RO (top 41%)

22 Papers

1.5CVJul 22, 2023Code

A Vision for Cleaner Rivers: Harnessing Snapshot Hyperspectral Imaging to Detect Macro-Plastic Litter

Nathaniel Hanson, Ahmet Demirkaya, Deniz Erdoğmuş et al.

Plastic waste entering the riverine harms local ecosystems leading to negative ecological and economic impacts. Large parcels of plastic waste are transported from inland to oceans leading to a global scale problem of floating debris fields. In this context, efficient and automatized monitoring of mismanaged plastic waste is paramount. To address this problem, we analyze the feasibility of macro-plastic litter detection using computational imaging approaches in river-like scenarios. We enable near-real-time tracking of partially submerged plastics by using snapshot Visible-Shortwave Infrared hyperspectral imaging. Our experiments indicate that imaging strategies associated with machine learning classification approaches can lead to high detection accuracy even in challenging scenarios, especially when leveraging hyperspectral data and nonlinear classifiers. All code, data, and models are available online: https://github.com/RIVeR-Lab/hyperspectral_macro_plastic_detection.

10.2ROSep 18, 2022

StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

Hongyu Li, Zhengang Li, Neset Unver Akmandor et al.

Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that employs a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a mobile robot which is used to fine-tune our model. The dataset, the code, and further details including additional visualizations are available at https://lhy.xyz/stereovoxelnet

8.5ROSep 22, 2023

E(2)-Equivariant Graph Planning for Navigation

Linfeng Zhao, Hongyu Li, Taskin Padir et al.

Learning for robot navigation presents a critical and challenging task. The scarcity and costliness of real-world datasets necessitate efficient learning approaches. In this letter, we exploit Euclidean symmetry in planning for 2D navigation, which originates from Euclidean transformations between reference frames and enables parameter sharing. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph and develop an equivariant message passing network to perform value iteration. Furthermore, to handle multi-camera input, we propose a learnable equivariant layer to lift features to a desired space. We conduct comprehensive evaluations across five diverse tasks encompassing structured and unstructured environments, along with maps of known and unknown, given point goals or semantic goals. Our experiments confirm the substantial benefits on training efficiency, stability, and generalization. More details can be found at the project website: https://lhy.xyz/e2-planning/.

22.3CVJul 7, 2025Code

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

Juyi Lin, Amir Taherin, Arash Akbari et al.

Recent large-scale Vision Language Action (VLA) models have shown superior performance in robotic manipulation tasks guided by natural language. However, current VLA models suffer from two drawbacks: (i) generation of massive tokens leading to high inference latency and increased training cost, and (ii) insufficient utilization of generated actions resulting in potential performance loss. To address these issues, we develop a training framework to finetune VLA models for generating significantly fewer action tokens with high parallelism, effectively reducing inference latency and training cost. Furthermore, we introduce an inference optimization technique with a novel voting-based ensemble strategy to combine current and previous action predictions, improving the utilization of generated actions and overall performance. Our results demonstrate that we achieve superior performance compared with state-of-the-art VLA models, achieving significantly higher success rates and 39$\times$ faster inference than OpenVLA with 46 Hz throughput on edge platforms, demonstrating practical deployability. The code is available at https://github.com/LukeLIN-web/VOTE.

4.1ROFeb 14, 2024

Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms

Michael Shaham, Risha Ranjan, Engin Kirda et al.

Autonomous vehicle platoons present near- and long-term opportunities to enhance operational efficiencies and save lives. The past 30 years have seen rapid development in the autonomous driving space, enabling new technologies that will alleviate the strain placed on human drivers and reduce vehicle emissions. This paper introduces a testbed for evaluating and benchmarking platooning algorithms on 1/10th scale vehicles with onboard sensors. To demonstrate the testbed's utility, we evaluate three algorithms, linear feedback and two variations of distributed model predictive control, and compare their results on a typical platooning scenario where the lead vehicle tracks a reference trajectory that changes speed multiple times. We validate our algorithms in simulation to analyze the performance as the platoon size increases, and find that the distributed model predictive control algorithms outperform linear feedback on hardware and in simulation.

9.5ROApr 17, 2025

ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation

Hongyu Li, James Akl, Srinath Sridhar et al.

Object 6D pose estimation is a critical challenge in robotics, particularly for manipulation tasks. While prior research combining visual and tactile (visuotactile) information has shown promise, these approaches often struggle with generalization due to the limited availability of visuotactile data. In this paper, we introduce ViTa-Zero, a zero-shot visuotactile pose estimation framework. Our key innovation lies in leveraging a visual model as its backbone and performing feasibility checking and test-time optimization based on physical constraints derived from tactile and proprioceptive observations. Specifically, we model the gripper-object interaction as a spring-mass system, where tactile sensors induce attractive forces, and proprioception generates repulsive forces. We validate our framework through experiments on a real-world robot setup, demonstrating its effectiveness across representative visual backbones and manipulation scenarios, including grasping, object picking, and bimanual handover. Compared to the visual models, our approach overcomes some drastic failure modes while tracking the in-hand object pose. In our experiments, our approach shows an average increase of 55% in AUC of ADD-S and 60% in ADD, along with an 80% lower position error compared to FoundationPose.

3.2ROJun 10, 2025Code

Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning

Neşet Ünver Akmandor, Sarvesh Prajapati, Mark Zolotas et al.

Traditional motion planning methods for robots with many degrees-of-freedom, such as mobile manipulators, are often computationally prohibitive for real-world settings. In this paper, we propose a novel multi-model motion planning pipeline, termed Re4MPC, which computes trajectories using Nonlinear Model Predictive Control (NMPC). Re4MPC generates trajectories in a computationally efficient manner by reactively selecting the model, cost, and constraints of the NMPC problem depending on the complexity of the task and robot state. The policy for this reactive decision-making is learned via a Deep Reinforcement Learning (DRL) framework. We introduce a mathematical formulation to integrate NMPC into this DRL framework. To validate our methodology and design choices, we evaluate DRL training and test outcomes in a physics-based simulation involving a mobile manipulator. Experimental results demonstrate that Re4MPC is more computationally efficient and achieves higher success rates in reaching end-effector goals than the NMPC baseline, which computes whole-body trajectories without our learning mechanism.

2.6LGApr 18, 2024

Learning a Stable, Safe, Distributed Feedback Controller for a Heterogeneous Platoon of Autonomous Vehicles

Michael H. Shaham, Taskin Padir

Platooning of autonomous vehicles has the potential to increase safety and fuel efficiency on highways. The goal of platooning is to have each vehicle drive at a specified speed (set by the leader) while maintaining a safe distance from its neighbors. Many prior works have analyzed various controllers for platooning, most commonly linear feedback and distributed model predictive controllers. In this work, we introduce an algorithm for learning a stable, safe, distributed controller for a heterogeneous platoon. Our algorithm relies on recent developments in learning neural network stability certificates. We train a controller for autonomous platooning in simulation and evaluate its performance on hardware with a platoon of four F1Tenth vehicles. We then perform further analysis in simulation with a platoon of 100 vehicles. Experimental results demonstrate the practicality of the algorithm and the learned controller by comparing the performance of the neural network controller to linear feedback and distributed model predictive controllers.

7.3ROApr 26, 2021Code

End-to-end grasping policies for human-in-the-loop robots via deep reinforcement learning

Mohammadreza Sharif, Deniz Erdogmus, Christopher Amato et al.

State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromyography (EMG) inference robustness issues. As a workaround, researchers have been looking into integrating EMG with other signals, often in an ad hoc manner. In this paper, we are presenting a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories. For this purpose we use Reinforcement Learning (RL) and Imitation Learning (IL) in DEXTRON (DEXTerity enviRONment), a stochastic simulation environment with real human trajectories that are augmented and selected using a Monte Carlo (MC) simulation method. We also offer a success model which once trained on the expert policy data and the RL policy roll-out transitions, can provide transparency to how the deep policy works and when it is probably going to fail.

13.8ROApr 8, 2021

Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif et al.

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

5.6CVMar 8, 2021

From Hand-Perspective Visual Information to Grasp Type Probabilities: Deep Learning via Ranking Labels

Mo Han, Sezen Ya{ğ}mur Günay, İlkay Yıldız et al.

Limb deficiency severely affects the daily lives of amputees and drives efforts to provide functional robotic prosthetic hands to compensate this deprivation. Convolutional neural network-based computer vision control of the prosthetic hand has received increased attention as a method to replace or complement physiological signals due to its reliability by training visual information to predict the hand gesture. Mounting a camera into the palm of a prosthetic hand is proved to be a promising approach to collect visual data. However, the grasp type labelled from the eye and hand perspective may differ as object shapes are not always symmetric. Thus, to represent this difference in a realistic way, we employed a dataset containing synchronous images from eye- and hand- view, where the hand-perspective images are used for training while the eye-view images are only for manual labelling. Electromyogram (EMG) activity and movement kinematics data from the upper arm are also collected for multi-modal information fusion in future work. Moreover, in order to include human-in-the-loop control and combine the computer vision with physiological signal inputs, instead of making absolute positive or negative predictions, we build a novel probabilistic classifier according to the Plackett-Luce model. To predict the probability distribution over grasps, we exploit the statistical model over label rankings to solve the permutation domain problems via a maximum likelihood estimation, utilizing the manually ranked lists of grasps as a new form of label. We indicate that the proposed model is applicable to the most popular and productive convolutional neural network frameworks.

7.3ROMar 8, 2021

HANDS: A Multimodal Dataset for Modeling Towards Human Grasp Intent Inference in Prosthetic Hands

Mo Han, Sezen Ya{ğ}mur Günay, Gunar Schirner et al.

Upper limb and hand functionality is critical to many activities of daily living and the amputation of one can lead to significant functionality loss for individuals. From this perspective, advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user, but more importantly from the improved capability to infer human intent from multimodal sensor data to provide the robotic hand perception abilities regarding the operational context. Such multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors including electromyography and inertial measurement units. A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control. In this paper, we present a dataset of this type that was gathered with the anticipation of cameras being built into prosthetic hands, and computer vision methods will need to assess this hand-view visual evidence in order to estimate human intent. Specifically, paired images from human eye-view and hand-view of various objects placed at different orientations have been captured at the initial state of grasping trials, followed by paired video, EMG and IMU from the arm of the human during a grasp, lift, put-down, and retract style trial structure. For each trial, based on eye-view images of the scene showing the hand and object on a table, multiple humans were asked to sort in decreasing order of preference, five grasp types appropriate for the object in its given configuration relative to the hand. The potential utility of paired eye-view and hand-view images was illustrated by training a convolutional neural network to process hand-view images in order to predict eye-view labels assigned by humans.

10.4ROFeb 9, 2021

Affordance-Based Mobile Robot Navigation Among Movable Obstacles

Maozhen Wang, Rui Luo, Aykut Ozgun Onol et al.

Avoiding obstacles in the perceived world has been the classical approach to autonomous mobile robot navigation. However, this usually leads to unnatural and inefficient motions that significantly differ from the way humans move in tight and dynamic spaces, as we do not refrain interacting with the environment around us when necessary. Inspired by this observation, we propose a framework for autonomous robot navigation among movable obstacles (NAMO) that is based on the theory of affordances and contact-implicit motion planning. We consider a realistic scenario in which a mobile service robot negotiates unknown obstacles in the environment while navigating to a goal state. An affordance extraction procedure is performed for novel obstacles to detect their movability, and a contact-implicit trajectory optimization method is used to enable the robot to interact with movable obstacles to improve the task performance or to complete an otherwise infeasible task. We demonstrate the performance of the proposed framework by hardware experiments with Toyota's Human Support Robot.

2.2RONov 11, 2020

Learning Bayes Filter Models for Tactile Localization

Tarik Kelestemur, Colin Keil, John P. Whitney et al.

Localizing and tracking the pose of robotic grippers are necessary skills for manipulation tasks. However, the manipulators with imprecise kinematic models (e.g. low-cost arms) or manipulators with unknown world coordinates (e.g. poor camera-arm calibration) cannot locate the gripper with respect to the world. In these circumstances, we can leverage tactile feedback between the gripper and the environment. In this paper, we present learnable Bayes filter models that can localize robotic grippers using tactile feedback. We propose a novel observation model that conditions the tactile feedback on visual maps of the environment along with a motion model to recursively estimate the gripper's location. Our models are trained in simulation with self-supervision and transferred to the real world. Our method is evaluated on a tabletop localization task in which the gripper interacts with objects. We report results in simulation and on a real robot, generalizing over different sizes, shapes, and configurations of the objects.

7.0ROJul 16, 2020

Model-Based Manipulation of Linear Flexible Objects with Visual Curvature Feedback

Peng Chang, Taskin Padir

Manipulation of deformable objects is a desired skill in making robots ubiquitous in manufacturing, service, healthcare, and security. Deformable objects are common in our daily lives, e.g., wires, clothes, bed sheets, etc., and are significantly more difficult to model than rigid objects. In this study, we investigate vision-based manipulation of linear flexible objects such as cables. We propose a geometric modeling method that is based on visual feedback to develop a general representation of the linear flexible object that is subject to gravity. The model characterizes the shape of the object by combining the curvatures on two projection planes. In this approach, we achieve tracking of the position and orientation (pose) of a cable-like object, the pose of its tip, and the pose of the selected grasp point on the object, which enables closed-loop manipulation of the object. We demonstrate the feasibility of our approach by completing the Plug Task used in the 2015 DARPA Robotics Challenge Finals, which involves unplugging a power cable from one socket and plugging it into another. Experiments show that we can successfully complete the task autonomously within 30 seconds.

9.4ROJun 11, 2020

Tuning-Free Contact-Implicit Trajectory Optimization

Aykut Ozgun Onol, Radu Corcodel, Philip Long et al.

We present a contact-implicit trajectory optimization framework that can plan contact-interaction trajectories for different robot architectures and tasks using a trivial initial guess and without requiring any parameter tuning. This is achieved by using a relaxed contact model along with an automatic penalty adjustment loop for suppressing the relaxation. Moreover, the structure of the problem enables us to exploit the contact information implied by the use of relaxation in the previous iteration, such that the solution is explicitly improved with little computational overhead. We test the proposed approach in simulation experiments for non-prehensile manipulation using a 7-DOF arm and a mobile robot and for planar locomotion using a humanoid-like robot in zero gravity. The results demonstrate that our method provides an out-of-the-box solution with good performance for a wide range of applications.

13.7ROFeb 6, 2020

Sim2Real2Sim: Bridging the Gap Between Simulation and Real-World in Flexible Object Manipulation

Peng Chang, Taskin Padir

This paper addresses a new strategy called Simulation-to-Real-to-Simulation (Sim2Real2Sim) to bridge the gap between simulation and real-world, and automate a flexible object manipulation task. This strategy consists of three steps: (1) using the rough environment with the estimated models to develop the methods to complete the manipulation task in the simulation; (2) applying the methods from simulation to real-world and comparing their performance; (3) updating the models and methods in simulation based on the differences between the real world and the simulation. The Plug Task from the 2015 DARPA Robotics Challenge Finals is chosen to evaluate our Sim2Real2Sim strategy. A new identification approach for building the model of the linear flexible objects is derived from real-world to simulation. The automation of the DRC plug task in both simulation and real-world proves the success of the Sim2Real2Sim strategy. Numerical experiments are implemented to validate the simulated model.

4.1ROJan 24, 2020Code

A 3D Reactive Navigation Algorithm for Mobile Robots by Using Tentacle-Based Sampling

Neşet Ünver Akmandor, Taşkın Padır

This paper introduces a reactive navigation framework for mobile robots in 3-dimensional (3D) space. The proposed approach does not rely on the global map information and achieves fast navigation by employing a tentacle based sampling and their heuristic evaluations on-the-fly. This reactive nature of the approach comes from the prior arrangement of navigation points on tentacles (parametric contours) to sample the navigation space. These tentacles are evaluated at each time-step, based on heuristic features such as closeness to the goal, previous tentacle preferences and nearby obstacles in a robot-centered 3D grid. Then, the navigable sampling point on the selected tentacle is passed to a controller for the motion execution. The proposed framework does not only extend its 2D tentacle-based counterparts into 3D, but also introduces offline and online parameters, whose tuning provides versatility and adaptability of the algorithm to work in unknown environments. To demonstrate the superior performance of the proposed algorithm over a state-of-art method, the statistical results from physics-based simulations on various maps are presented. The video of the work is available at https://youtu.be/rrF7wHCz-0M.

6.3RONov 23, 2018

A Blended Human-Robot Shared Control Framework to Handle Drift and Latency

Anas Abou Allaban, Velin Dimitrov, Taşkın Padır

Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operator's input and compensate for the factors resulting in degraded performance. What this solution does not consider though is the human input latency and errors caused by potential hardware failures that can occur during task completion when operating in disaster response and SAR scenarios. In this paper, we propose the use of blended shared control (BSC) architecture to address these issues and investigate the architecture's performance in constrained, dynamic environments with a differential drive robot that has input latency and erroneous odometry feedback. We conduct a validation study (n=12) for our control architecture and then a user study (n=14) in 2 different environments that are unknown to both the human operator and the autonomous agent. The results demonstrate that the BSC architecture can prevent collisions and enhance operator performance without the need of a complete transfer of control between the human operator and autonomous agent.

14.2ROOct 24, 2018

Contact-Implicit Trajectory Optimization Based on a Variable Smooth Contact Model and Successive Convexification

Aykut Ozgun Onol, Philip Long, Taskin Padir

In this paper, we propose a contact-implicit trajectory optimization (CITO) method based on a variable smooth contact model (VSCM) and successive convexification (SCvx). The VSCM facilitates the convergence of gradient-based optimization without compromising physical fidelity. On the other hand, the proposed SCvx-based approach combines the advantages of direct and shooting methods for CITO. For evaluations, we consider non-prehensile manipulation tasks. The proposed method is compared to a version based on iterative linear quadratic regulator (iLQR) on a planar example. The results demonstrate that both methods can find physically-consistent motions that complete the tasks without a meaningful initial guess owing to the VSCM. The proposed SCvx-based method outperforms the iLQR-based method in terms of convergence, computation time, and the quality of motions found. Finally, the proposed SCvx-based method is tested on a standard robot platform and shown to perform efficiently for a real-world application.

2.9ROJul 12, 2018

Integrating Risk in Humanoid Robot Control for Applications in the Nuclear Industry

Xianchao Long, Philip Long, Aykut Onol et al.

This paper discuss the integration of risk into a robot control framework for decommissioning applications in the nuclear industry. Our overall objective is to allow the robot to evaluate a risk associated with several methods of completing the same task by combining a set of action sequences. If the environment is known and in the absence of sensing errors each set of actions would successfully complete the task. In this paper, instead of attempting to model the errors associated with each sensing system in order to compute an exact solution, a set of solutions are obtained along with a prescribed risk index. The risk associated with each set of actions can then be compared to possible payoffs or rewards, for instance task completion time or power consumption. This information is then sent to a high level decision planner, for instance a human teleoperator, who can then make a more informed decision regarding the robots actions. In order to illustrate the concept, we introduce three specific risk measures, namely, the collision risk and the risk of toppling and failure risk associated with grasping an object. We demonstrate the results from this foundational study of risk-aware compositional robot autonomy in simulation using NASA's Valkyrie humanoid robot, and the grasping simulator HAPTIX.

2.9ROJul 11, 2018

Using Contact to Increase Robot Performance for Glovebox D&D Tasks

Aykut Onol, Philip Long, Taskin Padir

Glovebox decommissioning tasks usually require manipulating relatively heavy objects in a highly constrained environment. Thus, contact with the surroundings becomes inevitable. In order to allow the robot to interact with the environment in a natural way, we present a contact-implicit motion planning framework. This framework enables the system, without the specification in advance of a contact plan, to make and break contacts to maintain stability while performing a manipulation task. In this method, we use linear complementarity constraints to model rigid body contacts and find a locally optimal solution for joint displacements and magnitudes of support forces. Then, joint torques are calculated such that the support forces have the highest priority. We evaluate our framework in a 2.5D, quasi-static simulation in which a humanoid robot with planar arms manipulates a heavy object. Our results suggest that the proposed method provides the robot with the ability to balance itself by generating support forces on the environment while simultaneously performing the manipulation task.