K Madhava Krishna

RO
h-index8
28papers
398citations
Novelty50%
AI Score46

28 Papers

CVJun 9, 2023
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork

Bipasha Sen, Gaurav Singh, Aditya Agarwal et al. · mila, mit

Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.

CVSep 29, 2022
GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions

Sanket Kalwar, Dhruv Patel, Aakash Aanegola et al.

Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition images such as those captured under fog and low lighting. Our proposed GDIP block learns to enhance images directly through the downstream object detection loss. This is achieved by learning parameters of multiple image pre-processing (IP) techniques that operate concurrently, with their outputs combined using weights learned through a novel gating mechanism. We further improve GDIP through a multi-stage guidance procedure for progressive image enhancement. Finally, trading off accuracy for speed, we propose a variant of GDIP that can be used as a regularizer for training Yolo, which eliminates the need for GDIP-based image enhancement during inference, resulting in higher throughput and plausible real-world deployment. We demonstrate significant improvement in detection performance over several state-of-the-art methods through quantitative and qualitative studies on synthetic datasets such as PascalVOC, and real-world foggy (RTTS) and low-lighting (ExDark) datasets.

ROMay 26
Object Pose and Shape Estimation for Grasping: Does it Work?

Pavan Karke, Kushal Shah, Gaurav Singh et al.

The problem of object pose and shape estimation has seen key advancements lately. Encoder-decoder (e.g., SAM3D, LRM, CRISP) and diffusion-based models (e.g., InstantMesh, Zero123, SceneComplete) have shown category-agnostic shape encoding capacity and open-set generalizability. In this work, we ask the question: Are the object pose and shape estimation methods mature enough, such that when used with antipodal grasp sampling, can outperform the end-to-end grasp synthesis methods? We explore this question in detail by scoping our study to parallel jaw grippers, 7-DoF grasps, and single-view RGB(-D) image as input. We implement and compare a state-of-the-art, end-to-end grasp synthesis method and three modular methods, which first estimate the object pose and shape for all objects in the scene, and generate grasps using antipodal sampling. We observe that the modular methods outperform the end-to-end method in all our experiments. The modular methods are able to synthesize plenty of grasps, even for small objects, where the end-to-end methods fail. The effectiveness of the modular methods is contingent on the accuracy of the pose and shape estimation, and suffers partial degradation in cluttered scenes - a limitation of the existing pose and shape estimation methods. We also analyze the failure modes and run-times for the three modular methods, which use two different ways of object pose and shape estimation: one based on an encoder-decoder model, while another a diffusion model. Finally, we demonstrate that the single-view object pose and shape estimation methods can be augmented with vision-language models to yield language-conditioned grasps from just single-view RGB-D image as input. We notice comparable performance to the state-of-the-art LERF-TOGO baseline.

ROSep 18, 2024Code
Towards Global Localization using Multi-Modal Object-Instance Re-Identification

Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat et al.

Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.

CVOct 6, 2023
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions

Sanket Kalwar, Mihir Ungarala, Shruti Jain et al.

Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.

ROApr 6, 2024
Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation

Gaurav Singh, Sanket Kalwar, Md Faizal Karim et al. · mit

Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.

CVOct 16, 2025
Leveraging Cycle-Consistent Anchor Points for Self-Supervised RGB-D Registration

Siddharth Tourani, Jayaram Reddy, Sarvesh Thakur et al. · cmu

With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration meth- ods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous self- supervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.

RONov 15, 2024
Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies

Anant Garg, K Madhava Krishna

World Model-based Reinforcement Learning (WMRL) enables sample efficient policy learning by reducing the need for online interactions which can potentially be costly and unsafe, especially for autonomous driving. However, existing world models often suffer from low prediction fidelity and compounding one-step errors, leading to policy degradation over long horizons. Additionally, traditional RL policies, often deterministic or single Gaussian-based, fail to capture the multi-modal nature of decision-making in complex driving scenarios. To address these challenges, we propose Imagine-2-Drive, a novel WMRL framework that integrates a high-fidelity world model with a multi-modal diffusion-based policy actor. It consists of two key components: DiffDreamer, a diffusion-based world model that generates future observations simultaneously, mitigating error accumulation, and DPA (Diffusion Policy Actor), a diffusion-based policy that models diverse and multi-modal trajectory distributions. By training DPA within DiffDreamer, our method enables robust policy learning with minimal online interactions. We evaluate our method in CARLA using standard driving benchmarks and demonstrate that it outperforms prior world model baselines, improving Route Completion and Success Rate by 15% and 20% respectively.

CVNov 16, 2024
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation

Ansh Shah, K Madhava Krishna

Recovering metric depth from a single image remains a fundamental challenge in computer vision, requiring both scene understanding and accurate scaling. While deep learning has advanced monocular depth estimation, current models often struggle with unfamiliar scenes and layouts, particularly in zero-shot scenarios and when predicting scale-ergodic metric depth. We present MetricGold, a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation. Building upon recent advances in MariGold, DDVM and Depth Anything V2 respectively, our method combines latent diffusion, log-scaled metric depth representation, and synthetic data training. MetricGold achieves efficient training on a single RTX 3090 within two days using photo-realistic synthetic data from HyperSIM, VirtualKitti, and TartanAir. Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates compared to existing approaches.

CVDec 24, 2021
Grounding Linguistic Commands to Navigable Regions

Nivedita Rufus, Kanishk Jain, Unni Krishnan R Nair et al.

Humans have a natural ability to effortlessly comprehend linguistic commands such as "park next to the yellow sedan" and instinctively know which region of the road the vehicle should navigate. Extending this ability to autonomous vehicles is the next step towards creating fully autonomous agents that respond and act according to human commands. To this end, we propose the novel task of Referring Navigable Regions (RNR), i.e., grounding regions of interest for navigation based on the linguistic command. RNR is different from Referring Image Segmentation (RIS), which focuses on grounding an object referred to by the natural language expression instead of grounding a navigable region. For example, for a command "park next to the yellow sedan," RIS will aim to segment the referred sedan, and RNR aims to segment the suggested parking region on the road. We introduce a new dataset, Talk2Car-RegSeg, which extends the existing Talk2car dataset with segmentation masks for the regions described by the linguistic commands. A separate test split with concise manoeuvre-oriented commands is provided to assess the practicality of our dataset. We benchmark the proposed dataset using a novel transformer-based architecture. We present extensive ablations and show superior performance over baselines on multiple evaluation metrics. A downstream path planner generating trajectories based on RNR outputs confirms the efficacy of the proposed framework.

RODec 21, 2021
Design And Analysis Of Three-Output Open Differential with 3-DOF

Rama Vadapalli, Nagamanikandan Govindan, K Madhava Krishna

This paper presents a novel passive three-output differential with three degrees of freedom (3DOF), that translates motion and torque from a single input to three outputs. The proposed Three-Output Open Differential is designed such that its functioning is analogous to the functioning of a traditional two-output open differential. That is, the differential translates equal motion and torque to all its three outputs when the outputs are unconstrained or are subjected to equivalent load conditions. The introduced design is the first differential with three outputs to realise this outcome. The differential action between the three outputs is realised passively by a symmetric arrangement of three two-output open differentials and three two-input open differentials. The resulting differential mechanism achieves the novel result of equivalent input to output angular velocity and torque relations for all its three outputs. Furthermore, Three-Output Open Differential achieves the novel result for differentials with more than two outputs where each of its outputs shares equivalent angular velocity and torque relations with all the other outputs. The kinematics and dynamics of the Three-Output Open Differential are derived using the bond graph method. In addition, the merits of the differential mechanism along with its current and potential applications are presented.

RONov 1, 2021
Modular Pipe Climber III with Three-Output Open Differential

Rama Vadapalli, Saharsh Agarwal, Vishnu Kumar et al.

The paper introduces the novel Modular Pipe Climber III with a Three-Output Open Differential (3-OOD) mechanism to eliminate slipping of the tracks due to the changing cross-sections of the pipe. This will be achieved in any orientation of the robot. Previous pipe climbers use three-wheel/track modules, each with an individual driving mechanism to achieve stable traversing. Slipping of tracks is prevalent in such robots when it encounters the pipe turns. Thus, active control of each module's speed is employed to mitigate the slip, thereby requiring substantial control effort. The proposed pipe climber implements the 3-OOD to address this issue by allowing the robot to mechanically modulate the track speeds as it encounters a turn. The proposed 3-OOD is the first three-output differential to realize the functional abilities of a traditional two-output differential.

ROOct 6, 2021
CCO-VOXEL: Chance Constrained Optimization over Uncertain Voxel-Grid Representation for Safe Trajectory Planning

Sudarshan S Harithas, Rishabh Dev Yadav, Deepak Singh et al.

We present CCO-VOXEL: the very first chance-constrained optimization (CCO) algorithm that can compute trajectory plans with probabilistic safety guarantees in real-time directly on the voxel-grid representation of the world. CCO-VOXEL maps the distribution over the distance to the closest obstacle to a distribution over collision-constraint violation and computes an optimal trajectory that minimizes the violation probability. Importantly, unlike existing works, we never assume the nature of the sensor uncertainty or the probability distribution of the resulting collision-constraint violations. We leverage the notion of Hilbert Space embedding of distributions and Maximum Mean Discrepancy (MMD) to compute a tractable surrogate for the original chance-constrained optimization problem and employ a combination of A* based graph-search and Cross-Entropy Method for obtaining its minimum. We show tangible performance gain in terms of collision avoidance and trajectory smoothness as a consequence of our probabilistic formulation vis a vis state-of-the-art planning methods that do not account for such nonparametric noise. Finally, we also show how a combination of low-dimensional feature embedding and pre-caching of Kernel Matrices of MMD allows us to achieve real-time performance in simulations as well as in implementations on on-board commodity hardware that controls the quadrotor flight

ROSep 21, 2021
Multi-Modal Model Predictive Control through Batch Non-Holonomic Trajectory Optimization: Application to Highway Driving

Vivek K. Adajania, Aditya Sharma, Anish Gupta et al.

Standard Model Predictive Control (MPC) or trajectory optimization approaches perform only a local search to solve a complex non-convex optimization problem. As a result, they cannot capture the multi-modal characteristic of human driving. A global optimizer can be a potential solution but is computationally intractable in a real-time setting. In this paper, we present a real-time MPC capable of searching over different driving modalities. Our basic idea is simple: we run several goal-directed parallel trajectory optimizations and score the resulting trajectories based on user-defined meta cost functions. This allows us to perform a global search over several locally optimal motion plans. Although conceptually straightforward, realizing this idea in real-time with existing optimizers is highly challenging from technical and computational standpoints. With this motivation, we present a novel batch non-holonomic trajectory optimization whose underlying matrix algebra is easily parallelizable across problem instances and reduces to computing large batch matrix-vector products. This structure, in turn, is achieved by deriving a linearization-free multi-convex reformulation of the non-holonomic kinematics and collision avoidance constraints. We extensively validate our approach using both synthetic and real data sets (NGSIM) of traffic scenarios. We highlight how our algorithm automatically takes lane-change and overtaking decisions based on the defined meta cost function. Our batch optimizer achieves trajectories with lower meta cost, up to 6x faster than competing baselines.

CVMay 9, 2020
Understanding Dynamic Scenes using Graph Convolution Networks

Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan et al.

We present a novel Multi-Relational Graph Convolutional Network (MRGCN) based framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a multi-relational graph where the graph's nodes represent the active and passive agents/objects in the scene, and the bidirectional edges that connect every pair of nodes are encodings of their Spatio-temporal relations. We show that this proposed explicit encoding and usage of an intermediate spatio-temporal interaction graph to be well suited for our tasks over learning end-end directly on a set of temporally ordered spatial relations. We also propose an attention mechanism for MRGCNs that conditioned on the scene dynamically scores the importance of information from different interaction types. The proposed framework achieves significant performance gain over prior methods on vehicle-behavior classification tasks on four datasets. We also show a seamless transfer of learning to multiple datasets without resorting to fine-tuning. Such behavior prediction methods find immediate relevance in a variety of navigation tasks such as behavior planning, state estimation, and applications relating to the detection of traffic violations over videos.

ROMar 12, 2020
LiDAR guided Small obstacle Segmentation

Aasheesh Singh, Aditya Kamireddypalli, Vineet Gandhi et al.

Detecting small obstacles on the road is critical for autonomous driving. In this paper, we present a method to reliably detect such obstacles through a multi-modal framework of sparse LiDAR(VLP-16) and Monocular vision. LiDAR is employed to provide additional context in the form of confidence maps to monocular segmentation networks. We show significant performance gains when the context is fed as an additional input to monocular semantic segmentation frameworks. We further present a new semantic segmentation dataset to the community, comprising of over 3000 image frames with corresponding LiDAR observations. The images come with pixel-wise annotations of three classes off-road, road, and small obstacle. We stress that precise calibration between LiDAR and camera is crucial for this task and thus propose a novel Hausdorff distance based calibration refinement method over extrinsic parameters. As a first benchmark over this dataset, we report our results with 73% instance detection up to a distance of 50 meters on challenging scenarios. Qualitatively by showcasing accurate segmentation of obstacles less than 15 cms at 50m depth and quantitatively through favourable comparisons vis a vis prior art, we vindicate the method's efficacy. Our project-page and Dataset is hosted at https://small-obstacle-dataset.github.io/

CVFeb 3, 2020
Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks

Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan et al.

Understanding on-road vehicle behaviour from a temporal sequence of sensor data is gaining in popularity. In this paper, we propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A monocular sequence along with scene semantics, optical flow and object labels are used to get spatial information about the object (vehicle) of interest and other objects (semantically contiguous set of locations) in the scene. This spatial information is encoded by a Multi-Relational Graph Convolutional Network (MR-GCN), and a temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse and include European, Chinese and Indian on-road scenes. The framework also provides for seamless transfer of models across datasets without entailing re-annotation, retraining and even fine-tuning. We show comparative performance gain over baseline Spatio-temporal classifiers and detail a variety of ablations to showcase the efficacy of the framework.

CVOct 2, 2019
Object Parsing in Sequences Using CoordConv Gated Recurrent Networks

Ayush Gaud, Y V S Harish, K Madhava Krishna

We present a monocular object parsing framework for consistent keypoint localization by capturing temporal correlation on sequential data. In this paper, we propose a novel recurrent network based architecture to model long-range dependencies between intermediate features which are highly useful in tasks like keypoint localization and tracking. We leverage the expressiveness of the popular stacked hourglass architecture and augment it by adopting memory units between intermediate layers of the network with weights shared across stages for video frames. We observe that this weight sharing scheme not only enables us to frame hourglass architecture as a recurrent network but also prove to be highly effective in producing increasingly refined estimates for sequential tasks. Furthermore, we propose a new memory cell, we call CoordConvGRU which learns to selectively preserve spatio-temporal correlation and showcase our results on the keypoint localization task. The experiments show that our approach is able to model the motion dynamics between the frames and significantly outperforms the baseline hourglass network. Even though our network is trained on a synthetically rendered dataset, we observe that with minimal fine tuning on 300 real images we are able to achieve performance at par with various state-of-the-art methods trained with the same level of supervisory inputs. By using a simpler architecture than other methods enables us to run it in real time on a standard GPU which is desirable for such applications. Finally, we make our architectures and 524 annotated sequences of cars from KITTI dataset publicly available.

ROSep 23, 2019
Omnidirectional Tractable Three Module Robot

Kartik Suryavanshi, Rama Vadapalli, Ruchitha Vucha et al.

This paper introduces the Omnidirectional Tractable Three Module Robot for traversing inside complex pipe networks. The robot consists of three omnidirectional modules fixed 120° apart circumferentially which can rotate about their own axis allowing holonomic motion of the robot. The holonomic motion enables the robot to overcome motion singularity when negotiating T-junctions and further allows the robot to arrive in a preferred orientation while taking turns inside a pipe. We have developed a closed-form kinematic model for the robot in the paper and propose the Motion Singularity Region that the robot needs to avoid while negotiating T-junction. The design and motion capabilities of the robot are demonstrated both by conducting simulations in MSC ADAMS on a simplified lumped-model of the robot and with experiments on its physical embodiment.

ROSep 23, 2019
Modular Pipe Climber

Rama Vadapalli, Kartik Suryavanshi, Ruchita Vucha et al.

This paper discusses the design and implementation of the Modular Pipe Climber inside ASTM D1785 - 15e1 standard pipes [1]. The robot has three tracks which operate independently and are mounted on three modules which are oriented at 120° to each other. The tracks provide for greater surface traction compared to wheels [2]. The tracks are pushed onto the inner wall of the pipe by passive springs which help in maintaining the contact with the pipe during vertical climb and while turning in bends. The modules have the provision to compress asymmetrically, which helps the robot to take turns in bends in all directions. The motor torque required by the robot and the desired spring stiffness are calculated at quasistatic and static equilibriums when the pipe climber is in a vertical climb. The springs were further simulated and analyzed in ADAMS MSC. The prototype built based on these obtained values was experimented on, in complex pipe networks. Differential speed is employed when turning in bends to improve the efficiency and reduce the stresses experienced by the robot.

ROJul 2, 2019
SVM Enhanced Frenet Frame Planner For Safe Navigation Amidst Moving Agents

Unni Krishnan R Nair, Nivedita Rufus, Vashist Madiraju et al.

This paper proposes an SVM Enhanced Trajectory Planner for dynamic scenes, typically those encountered in on road settings. Frenet frame based trajectory generation is popular in the context of autonomous driving both in research and industry. We incorporate a safety based maximal margin criteria using a SVM layer that generates control points that are maximally separated from all dynamic obstacles in the scene. A kinematically consistent trajectory generator then computes a path through these waypoints. We showcase through simulations as well as real world experiments on a self driving car that the SVM enhanced planner provides for a larger offset with dynamic obstacles than the regular Frenet frame based trajectory generation. Thereby, the authors argue that such a formulation is inherently suited for navigation amongst pedestrians. We assume the availability of an intent or trajectory prediction module that predicts the future trajectories of all dynamic actors in the scene.

ROMay 12, 2019
Integrating Objects into Monocular SLAM: Line Based Category Specific Models

Nayan Joshi, Yogesh Sharma, Parv Parkhiya et al.

We propose a novel Line based parameterization for category specific CAD models. The proposed parameterization associates 3D category-specific CAD model and object under consideration using a dictionary based RANSAC method that uses object Viewpoints as prior and edges detected in the respective intensity image of the scene. The association problem is posed as a classical Geometry problem rather than being dataset driven, thus saving the time and labour that one invests in annotating dataset to train Keypoint Network for different category objects. Besides eliminating the need of dataset preparation, the approach also speeds up the entire process as this method processes the image only once for all objects, thus eliminating the need of invoking the network for every object in an image across all images. A 3D-2D edge association module followed by a resection algorithm for lines is used to recover object poses. The formulation optimizes for shape and pose of the object, thus aiding in recovering object 3D structure more accurately. Finally, a Factor Graph formulation is used to combine object poses with camera odometry to formulate a SLAM problem.

ROMay 9, 2018
Learning Coordinated Tasks using Reinforcement Learning in Humanoids

S Phaniteja, Parijat Dewangan, Pooja Guhan et al.

With the advent of artificial intelligence and machine learning, humanoid robots are made to learn a variety of skills which humans possess. One of fundamental skills which humans use in day-to-day activities is performing tasks with coordination between both the hands. In case of humanoids, learning such skills require optimal motion planning which includes avoiding collisions with the surroundings. In this paper, we propose a framework to learn coordinated tasks in cluttered environments based on DiGrad - A multi-task reinforcement learning algorithm for continuous action-spaces. Further, we propose an algorithm to smooth the joint space trajectories obtained by the proposed framework in order to reduce the noise instilled during training. The proposed framework was tested on a 27 degrees of freedom (DoF) humanoid with articulated torso for performing coordinated object-reaching task with both the hands in four different environments with varying levels of difficulty. It is observed that the humanoid is able to plan collision free trajectory in real-time. Simulation results also reveal the usefulness of the articulated torso for performing tasks which require coordination between both the arms.

LGFeb 27, 2018
DiGrad: Multi-Task Reinforcement Learning with Shared Actions

Parijat Dewangan, S Phaniteja, K Madhava Krishna et al.

Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.

ROJan 31, 2018
A Deep Reinforcement Learning Approach for Dynamically Stable Inverse Kinematics of Humanoid Robots

S Phaniteja, Parijat Dewangan, Pooja Guhan et al.

Real time calculation of inverse kinematics (IK) with dynamically stable configuration is of high necessity in humanoid robots as they are highly susceptible to lose balance. This paper proposes a methodology to generate joint-space trajectories of stable configurations for solving inverse kinematics using Deep Reinforcement Learning (RL). Our approach is based on the idea of exploring the entire configuration space of the robot and learning the best possible solutions using Deep Deterministic Policy Gradient (DDPG). The proposed strategy was evaluated on the highly articulated upper body of a humanoid model with 27 degree of freedom (DoF). The trained model was able to solve inverse kinematics for the end effectors with 90% accuracy while maintaining the balance in double support phase.

ROJul 19, 2016
FPGA based hybrid architecture for parallelizing RRT

Gurshaant Malik, Krishna Gupta, Raunak Dharani et al.

Field Programmable Gate Arrays(FPGA) exceed the computing power of software based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle by enabling hardware level parallelization at an architectural level. Introducing parallel architectures for a computationally intensive algorithm like Rapidly Exploring Random Trees(RRT) will result in an exploration that is fast, dense and uniform. Through a cost function delineated in later sections, FPGA based combinatorial architecture delivers superlative speed-up but consumes very high power while hierarchical architecture delivers relatively lower speed-up with acceptable power consumption levels. To combine the qualities of both, a hybrid architecture, that encompasses both combinatorial and hierarchical architecture, is designed. To determine the number of RRT nodes to be allotted to the combinatorial and hierarchical blocks of the hybrid architecture, a cost function, comprised of fundamentally inversely related speed-up and power parameters, is formulated. This maximization of cost function, with its associated constraints,is then mathematically solved using a modified branch and bound, that leads to optimal allocation of RRT modules to both blocks. It is observed that this hybrid architecture delivers the highest performance-per-watt out of the three architectures for differential, quad-copter and fixed wing kinematics.

ROJul 11, 2016
Design of a Robust Stair Climbing Compliant Modular Robot to Tackle Overhang on Stairs

Ajinkya Bhole, Sri Harsha Turlapati, Rajashekhar V. S et al.

This paper discusses the concept and parameter design of a Robust Stair Climbing Compliant Modular Robot, capable of tackling stairs with overhangs. Modifying the geometry of the periphery of the wheels of our robot helps in tackling overhangs. Along with establishing a concept design, robust design parameters are set to minimize performance variation. The Grey-based Taguchi Method is adopted for providing an optimal setting for the design parameters of the robot. The robot prototype is shown to have successfully scaled stairs of varying dimensions, with overhang, thus corroborating the analysis performed.

CVDec 23, 2013
Top Down Approach to Multiple Plane Detection

Prateek Singhal, Aditya Deshpande, N Dinesh Reddy et al.

Detecting multiple planes in images is a challenging problem, but one with many applications. Recent work such as J-Linkage and Ordered Residual Kernels have focussed on developing a domain independent approach to detect multiple structures. These multiple structure detection methods are then used for estimating multiple homographies given feature matches between two images. Features participating in the multiple homographies detected, provide us the multiple scene planes. We show that these methods provide locally optimal results and fail to merge detected planar patches to the true scene planes. These methods use only residues obtained on applying homography of one plane to another as cue for merging. In this paper, we develop additional cues such as local consistency of planes, local normals, texture etc. to perform better classification and merging . We formulate the classification as an MRF problem and use TRWS message passing algorithm to solve non metric energy terms and complex sparse graph structure. We show results on challenging dataset common in robotics navigation scenarios where our method shows accuracy of more than 85 percent on average while being close or same as the actual number of scene planes.