Sadman Sakib Enan

h-index4

7papers

53citations

Novelty42%

AI Score28

Ranked #150,077 of 194,257 authors (top 77%)#4,636 in RO (top 69%)

7 Papers

4.0ROJul 12, 2022

Robotic Detection of a Human-Comprehensible Gestural Language for Underwater Multi-Human-Robot Collaboration

Sadman Sakib Enan, Michael Fulton, Junaed Sattar

In this paper, we present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers. We design a gestural language for AUV-to-AUV communication which can be easily understood by divers observing the conversation unlike typical radio frequency, light, or audio based AUV communication. To allow AUVs to visually understand a gesture from another AUV, we propose a deep network (RRCommNet) which exploits a self-attention mechanism to learn to recognize each message by extracting maximally discriminative spatio-temporal features. We train this network on diverse simulated and real-world data. Our experimental evaluations, both in simulation and in closed-water robot trials, demonstrate that the proposed RRCommNet architecture is able to decipher gesture-based messages with an average accuracy of 88-94% on simulated data, 73-83% on real data (depending on the version of the model used). Further, by performing a message transcription study with human participants, we also show that the proposed language can be understood by humans, with an overall transcription accuracy of 88%. Finally, we discuss the inference runtime of RRCommNet on embedded GPU hardware, for real-time use on board AUVs in the field.

5.4ROJun 10

Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration

Sadman Sakib Enan, Junaed Sattar

Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2,600 annotated images with pixel-level masks. Through rigorous experimental evaluations in a controlled environment, we demonstrate that DAR-Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.

2.2ROSep 28, 2022

A Diver Attention Estimation Framework for Effective Underwater Human-Robot Interaction

Sadman Sakib Enan, Junaed Sattar

Many underwater tasks, such as cable-and-wreckage inspection and search-and-rescue, can benefit from robust Human-Robot Interaction (HRI) capabilities. With the recent advancements in vision-based underwater HRI methods, Autonomous Underwater Vehicles (AUVs) have the capability to interact with their human partners without requiring assistance from a topside operator. However, in these methods, the AUV assumes that the diver is ready for interaction, while in reality, the diver may be distracted. In this paper, we attempt to address this problem by presenting a diver attention estimation framework for AUVs to autonomously determine the attentiveness of a diver, and developing a robot controller to allow the AUV to navigate and reorient itself with respect to the diver before initiating interaction. The core element of the framework is a deep convolutional neural network called DATT-Net. It is based on a pyramid structure that can exploit the geometric relations among 10 facial keypoints of a diver to estimate their head orientation, which we use as an indicator of attentiveness. Our on-the-bench experimental evaluations and real-world experiments during both closed- and open-water robot trials confirm the efficacy of the proposed framework.

13.0ROMar 19, 2020Code

Design and Experiments with LoCO AUV: A Low Cost Open-Source Autonomous Underwater Vehicle

Chelsey Edge, Sadman Sakib Enan, Michael Fulton et al.

In this paper we present LoCO AUV, a Low-Cost, Open Autonomous Underwater Vehicle. LoCO is a general-purpose, single-person-deployable, vision-guided AUV, rated to a depth of 100 meters. We discuss the open and expandable design of this underwater robot, as well as the design of a simulator in Gazebo. Additionally, we explore the platform's preliminary local motion control and state estimation abilities, which enable it to perform maneuvers autonomously. In order to demonstrate its usefulness for a variety of tasks, we implement a variety of our previously presented human-robot interaction capabilities on LoCO, including gestural control, diver following, and robot communication via motion. Finally, we discuss the practical concerns of deployment and our experiences in using this robot in pools, lakes, and the ocean. All design details, instructions on assembly, and code will be released under a permissive, open-source license.

4.1RONov 18, 2020

Visual Diver Face Recognition for Underwater Human-Robot Interaction

Jungseok Hong, Sadman Sakib Enan, Christopher Morse et al.

This paper presents a deep-learned facial recognition method for underwater robots to identify scuba divers. Specifically, the proposed method is able to recognize divers underwater with faces heavily obscured by scuba masks and breathing apparatus. Our contribution in this research is towards robust facial identification of individuals under significant occlusion of facial features and image degradation from underwater optical distortions. With the ability to correctly recognize divers, autonomous underwater vehicles (AUV) will be able to engage in collaborative tasks with the correct person in human-robot teams and ensure that instructions are accepted from only those authorized to command the robots. We demonstrate that our proposed framework is able to learn discriminative features from real-world diver faces through different data augmentation and generation techniques. Experimental evaluations show that this framework achieves a 3-fold increase in prediction accuracy compared to the state-of-the-art (SOTA) algorithms and is well-suited for embedded inference on robotic platforms.

24.2CVApr 2, 2020

Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

Md Jahidul Islam, Chelsey Edge, Yuyang Xiao et al.

In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics. In addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that balances the trade-off between performance and computational efficiency. It offers competitive performance while ensuring fast end-to-end inference, which is essential for its use in the autonomy pipeline of visually-guided underwater robots. In particular, we demonstrate its usability benefits for visual servoing, saliency prediction, and detailed scene understanding. With a variety of use cases, the proposed model and benchmark dataset open up promising opportunities for future research in underwater robot vision.

13.3IVSep 20, 2019Code

Underwater Image Super-Resolution using Deep Residual Multipliers

Md Jahidul Islam, Sadman Sakib Enan, Peigen Luo et al.

We present a deep residual network-based generative model for single image super-resolution (SISR) of underwater imagery for use by autonomous underwater robots. We also provide an adversarial training pipeline for learning SISR from paired data. In order to supervise the training, we formulate an objective function that evaluates the \textit{perceptual quality} of an image based on its global content, color, and local style information. Additionally, we present USR-248, a large-scale dataset of three sets of underwater images of 'high' (640x480) and 'low' (80x60, 160x120, and 320x240) spatial resolution. USR-248 contains paired instances for supervised training of 2x, 4x, or 8x SISR models. Furthermore, we validate the effectiveness of our proposed model through qualitative and quantitative experiments and compare the results with several state-of-the-art models' performances. We also analyze its practical feasibility for applications such as scene understanding and attention modeling in noisy visual conditions.