Hanumant Singh

CV
h-index16
15papers
62citations
Novelty46%
AI Score53

15 Papers

ROMay 4Code
A Certifably Correct Algorithm for Generalized Robot-World and Hand-Eye Calibration

Emmett Wise, Pushyami Kaveti, Qilong Chen et al.

Automatic extrinsic sensor calibration is a fundamental problem for multi-sensor platforms. Reliable and general-purpose solutions should be computationally efficient, require few assumptions about the structure of the sensing environment, and demand little effort from human operators. In this work, we introduce a fast and certifiably globally optimal algorithm for solving a generalized formulation of the robot-world and hand-eye calibration (RWHEC) problem. The formulation of RWHEC presented is "generalized" in that it supports the simultaneous estimation of multiple sensor and target poses, and permits the use of monocular cameras that, alone, are unable to measure the scale of their environments. In addition to demonstrating our method's superior performance over existing solutions through extensive simulated and real experiments, we derive novel identifiability criteria and establish a priori guarantees of global optimality for problem instances with bounded measurement errors. As part of our analysis, we propose a new constraint qualification for nonlinear programs with redundant constraints; this constraint qualification is of independent interest for establishing the exactness of SDP relaxations of QCQPs that have been tightened through the addition of redundant constraints. Finally, we provide a free and open-source implementation of our algorithms and experiments.

CVJul 3, 2023Code
A Strong Baseline for Point Cloud Registration via Direct Superpoints Matching

Aniket Gupta, Yiming Xie, Hanumant Singh et al.

Deep neural networks endow the downsampled superpoints with highly discriminative feature representations. Previous dominant point cloud registration approaches match these feature representations as the first step, e.g., using the Sinkhorn algorithm. A RANSAC-like method is then usually adopted as a post-processing refinement to filter the outliers. Other dominant method is to directly predict the superpoint matchings using learned MLP layers. Both of them have drawbacks: RANSAC-based methods are computationally intensive and prediction-based methods suffer from outputing non-existing points in the point cloud. In this paper, we propose a straightforward and effective baseline to find correspondences of superpoints in a global matching manner. We employ the normalized matching scores as weights for each correspondence, allowing us to reject the outliers and further weigh the rest inliers when fitting the transformation matrix without relying on the cumbersome RANSAC. Moreover, the entire model can be trained in an end-to-end fashion, leading to better accuracy. Our simple yet effective baseline shows comparable or even better results than state-of-the-art methods on three datasets including ModelNet, 3DMatch, and KITTI. We do not advocate our approach to be \emph{the} solution for point cloud registration but use the results to emphasize the role of matching strategy for point cloud registration. The code and models are available at https://github.com/neu-vi/Superpoints_Registration.

LGApr 27, 2022
An Iterative Labeling Method for Annotating Fisheries Imagery

Zhiyong Zhang, Pushyami Kaveti, Hanumant Singh et al.

In this paper, we present a methodology for fisheries-related data that allows us to converge on a labeled image dataset by iterating over the dataset with multiple training and production loops that can exploit crowdsourcing interfaces. We present our algorithm and its results on two separate sets of image data collected using the Seabed autonomous underwater vehicle. The first dataset comprises of 2,026 completely unlabeled images, while the second consists of 21,968 images that were point annotated by experts. Our results indicate that training with a small subset and iterating on that to build a larger set of labeled data allows us to converge to a fully annotated dataset with a small number of iterations. Even in the case of a dataset labeled by experts, a single iteration of the methodology improves the labels by discovering additional complicated examples of labels associated with fish that overlap, are very small, or obscured by the contrast limitations associated with underwater imagery.

CVAug 19, 2024Code
NeuFlow v2: Push High-Efficiency Optical Flow To the Limit

Zhiyong Zhang, Aniket Gupta, Huaizu Jiang et al.

Real-time high-accuracy optical flow estimation is critical for a variety of real-world robotic applications. However, current learning-based methods often struggle to balance accuracy and computational efficiency: methods that achieve high accuracy typically demand substantial processing power, while faster approaches tend to sacrifice precision. These fast approaches specifically falter in their generalization capabilities and do not perform well across diverse real-world scenarios. In this work, we revisit the limitations of the SOTA methods and present NeuFlow-V2, a novel method that offers both - high accuracy in real-world datasets coupled with low computational overhead. In particular, we introduce a novel light-weight backbone and a fast refinement module to keep computational demands tractable while delivering accurate optical flow. Experimental results on synthetic and real-world datasets demonstrate that NeuFlow-V2 provides similar accuracy to SOTA methods while achieving 10x-70x speedups. It is capable of running at over 20 FPS on 512x384 resolution images on a Jetson Orin Nano. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow_v2.

RODec 8, 2025Code
Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear Optimization

Alan Papalia, Nikolas Sanderson, Haoyu Han et al.

Robotic perception often requires solving large nonlinear least-squares (NLS) problems. While sparsity has been well-exploited to scale solvers, a complementary and underexploited structure is \emph{separability} -- where some variables (e.g., visual landmarks) appear linearly in the residuals and, for any estimate of the remaining variables (e.g., poses), have a closed-form solution. Variable projection (VarPro) methods are a family of techniques that exploit this structure by analytically eliminating the linear variables and presenting a reduced problem in the remaining variables that has favorable properties. However, VarPro has seen limited use in robotic perception; a major challenge arises from gauge symmetries (e.g., cost invariance to global shifts and rotations), which are common in perception and induce specific computational challenges in standard VarPro approaches. We present a VarPro scheme designed for problems with gauge symmetries that jointly exploits separability and sparsity. Our method can be applied as a one-time preprocessing step to construct a \emph{matrix-free Schur complement operator}. This operator allows efficient evaluation of costs, gradients, and Hessian-vector products of the reduced problem and readily integrates with standard iterative NLS solvers. We provide precise conditions under which our method applies, and describe extensions when these conditions are only partially met. Across synthetic and real benchmarks in SLAM, SNL, and SfM, our approach achieves up to \textbf{2$\times$--35$\times$ faster runtimes} than state-of-the-art methods while maintaining accuracy. We release an open-source C++ implementation and all datasets from our experiments.

CVMar 15, 2024Code
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

Zhiyong Zhang, Huaizu Jiang, Hanumant Singh

Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow.

CVMar 28, 2024Code
Towards Long Term SLAM on Thermal Imagery

Colin Keil, Aniket Gupta, Pushyami Kaveti et al.

Visual SLAM with thermal imagery, and other low contrast visually degraded environments such as underwater, or in areas dominated by snow and ice, remain a difficult problem for many state of the art (SOTA) algorithms. In addition to challenging front-end data association, thermal imagery presents an additional difficulty for long term relocalization and map reuse. The relative temperatures of objects in thermal imagery change dramatically from day to night. Feature descriptors typically used for relocalization in SLAM are unable to maintain consistency over these diurnal changes. We show that learned feature descriptors can be used within existing Bag of Word based localization schemes to dramatically improve place recognition across large temporal gaps in thermal imagery. In order to demonstrate the effectiveness of our trained vocabulary, we have developed a baseline SLAM system, integrating learned features and matching into a classical SLAM algorithm. Our system demonstrates good local tracking on challenging thermal imagery, and relocalization that overcomes dramatic day to night thermal appearance changes. Our code and datasets are available here: https://github.com/neufieldrobotics/IRSLAM_Baseline

CVMay 28, 2023Code
Bayesian Decision Making to Localize Visual Queries in 2D

Syed Asjad, Aniket Gupta, Hanumant Singh

This report describes our approach for the EGO4D 2023 Visual Query 2D Localization Challenge. Our method aims to reduce the number of False Positives (FP) that occur because of high similarity between the visual crop and the proposed bounding boxes from the baseline's Region Proposal Network (RPN). Our method uses a transformer to determine similarity in higher dimensions which is used as our prior belief. The results are then combined together with the similarity in lower dimensions from the Siamese Head, acting as our measurement, to generate a posterior which is then used to determine the final similarity of the visual crop with the proposed bounding box. Our code is publicly available $\href{https://github.com/s-m-asjad/EGO4D_VQ2D}{here}$.

ROOct 2, 2019Code
ROS Rescue : Fault Tolerance System for Robot Operating System

Pushyami Kaveti, Hanumant Singh

In this chapter we discuss the problem of master failure in ROS1.0 and its impact on robotic deployments in the real world. We address this issue in this tutorial chapter where we outline, design and demonstrate a fault tolerant mechanism associated with ROS master failure. Unlike previous solutions which use primary backup replication and external checkpointing libraries which are process heavy, our mechanism adds a lightweight functionality to the ROS master to enable it to recover from failure. We present a modified version of ROS master which is equipped with a logging mechanism to record the meta information and network state of ROS nodes as well as a recovery mechanism to go back to the previous state without having to abort or restart all the nodes. We also implement an additional master monitor node responsible for failure detection on the master by polling it for its availability. Our code is implemented in python and preliminary tests were conducted successfully on a variety of land, aerial and underwater robots and a tele-operating computer running ROS Kinetic on Ubuntu 16.04. The code is publicly available under a creative commons license on github at https://github.com/PushyamiKaveti/fault-tolerant-ros-master

ROJul 15, 2025
A Roadmap for Climate-Relevant Robotics Research

Alan Papalia, Charles Dawson, Laurentiu L. Anton et al. · mit

Climate change is one of the defining challenges of the 21st century, and many in the robotics community are looking for ways to contribute. This paper presents a roadmap for climate-relevant robotics research, identifying high-impact opportunities for collaboration between roboticists and experts across climate domains such as energy, the built environment, transportation, industry, land use, and Earth sciences. These applications include problems such as energy systems optimization, construction, precision agriculture, building envelope retrofits, autonomous trucking, and large-scale environmental monitoring. Critically, we include opportunities to apply not only physical robots but also the broader robotics toolkit - including planning, perception, control, and estimation algorithms - to climate-relevant problems. A central goal of this roadmap is to inspire new research directions and collaboration by highlighting specific, actionable problems at the intersection of robotics and climate. This work represents a collaboration between robotics researchers and domain experts in various climate disciplines, and it serves as an invitation to the robotics community to bring their expertise to bear on urgent climate priorities.

CVOct 13, 2025
SNAP: Towards Segmenting Anything in Any Point Cloud

Aniket Gupta, Hanhui Wang, Charles Saunders et al.

Interactive 3D point cloud segmentation enables efficient annotation of complex 3D scenes through user-guided prompts. However, current approaches are typically restricted in scope to a single domain (indoor or outdoor), and to a single form of user interaction (either spatial clicks or textual prompts). Moreover, training on multiple datasets often leads to negative transfer, resulting in domain-specific tools that lack generalizability. To address these limitations, we present \textbf{SNAP} (\textbf{S}egment a\textbf{N}ything in \textbf{A}ny \textbf{P}oint cloud), a unified model for interactive 3D segmentation that supports both point-based and text-based prompts across diverse domains. Our approach achieves cross-domain generalizability by training on 7 datasets spanning indoor, outdoor, and aerial environments, while employing domain-adaptive normalization to prevent negative transfer. For text-prompted segmentation, we automatically generate mask proposals without human intervention and match them against CLIP embeddings of textual queries, enabling both panoptic and open-vocabulary segmentation. Extensive experiments demonstrate that SNAP consistently delivers high-quality segmentation results. We achieve state-of-the-art performance on 8 out of 9 zero-shot benchmarks for spatial-prompted segmentation and demonstrate competitive results on all 5 text-prompted benchmarks. These results show that a unified model can match or exceed specialized domain-specific approaches, providing a practical tool for scalable 3D annotation. Project page is at, https://neu-vi.github.io/SNAP/

RODec 19, 2020
A Light Field Front-end for Robust SLAM in Dynamic Environments

Pushyami Kaveti, Hanumant Singh

There is a general expectation that robots should operate in urban environments often consisting of potentially dynamic entities including people, furniture and automobiles. Dynamic objects pose challenges to visual SLAM algorithms by introducing errors into the front-end. This paper presents a Light Field SLAM front-end which is robust to dynamic environments. A Light Field captures a bundle of light rays emerging from a single point in space, allowing us to see through dynamic objects occluding the static background via Synthetic Aperture Imaging(SAI). We detect apriori dynamic objects using semantic segmentation and perform semantic guided SAI on the Light Field acquired from a linear camera array. We simultaneously estimate both the depth map and the refocused image of the static background in a single step eliminating the need for static scene initialization. The GPU implementation of the algorithm facilitates running at close to real time speeds of 4 fps. We demonstrate that our method results in improved robustness and accuracy of pose estimation in dynamic environments by comparing it with state of the art SLAM algorithms.

ROMar 24, 2020
Removing Dynamic Objects for Static Scene Reconstruction using Light Fields

Pushyami Kaveti, Sammie Katt, Hanumant Singh

There is a general expectation that robots should operate in environments that consist of static and dynamic entities including people, furniture and automobiles. These dynamic environments pose challenges to visual simultaneous localization and mapping (SLAM) algorithms by introducing errors into the front-end. Light fields provide one possible method for addressing such problems by capturing a more complete visual information of a scene. In contrast to a single ray from a perspective camera, Light Fields capture a bundle of light rays emerging from a single point in space, allowing us to see through dynamic objects by refocusing past them. In this paper we present a method to synthesize a refocused image of the static background in the presence of dynamic objects that uses a light-field acquired with a linear camera array. We simultaneously estimate both the depth and the refocused image of the static scene using semantic segmentation for detecting dynamic objects in a single time step. This eliminates the need for initializing a static map . The algorithm is parallelizable and is implemented on GPU allowing us execute it at close to real time speeds. We demonstrate the effectiveness of our method on real-world data acquired using a small robot with a five camera array.

ROOct 8, 2018
AutOTranS: an Autonomous Open World Transportation System

Brayan S. Zapata-Impata, Vikrant Shah, Hanumant Singh et al.

Tasks in outdoor open world environments are now ripe for automation with mobile manipulators. The dynamic, unstructured and unknown environments associated with such tasks -- a prime example would be collecting roadside trash -- makes them particularly challenging. In this paper we present an approach to solving the problem of picking up, transporting, and dropping off novel objects outdoors. Our solution integrates a navigation system, a grasp detection and planning system, and a custom task planner. We perform experiments that demonstrate that the system can be used to transport a wide class of novel objects (trash bags, general garbage, gardening tools and fruits) in unstructured settings outdoors with a relatively high end-to-end success rate of 85%. See it at work at: https://youtu.be/93nWXhaGEWA

CVSep 26, 2015
Anomaly Detection in Unstructured Environments using Bayesian Nonparametric Scene Modeling

Yogesh Girdhar, Walter Cho, Matthew Campbell et al.

This paper explores the use of a Bayesian non-parametric topic modeling technique for the purpose of anomaly detection in video data. We present results from two experiments. The first experiment shows that the proposed technique is automatically able characterize the underlying terrain, and detect anomalous flora in image data collected by an underwater robot. The second experiment shows that the same technique can be used on images from a static camera in a dynamic unstructured environment. In the second dataset, consisting of video data from a static seafloor camera capturing images of a busy coral reef, the proposed technique was able to detect all three instances of an underwater vehicle passing in front of the camera, amongst many other observations of fishes, debris, lighting changes due to surface waves, and benthic flora.