Hanwen Kang

CV
h-index17
11papers
659citations
Novelty40%
AI Score40

11 Papers

CVNov 7, 2023
High-fidelity 3D Reconstruction of Plants using Neural Radiance Field

Kewei Hu, Ying Wei, Yaoqiang Pan et al.

Accurate reconstruction of plant phenotypes plays a key role in optimising sustainable farming practices in the field of Precision Agriculture (PA). Currently, optical sensor-based approaches dominate the field, but the need for high-fidelity 3D reconstruction of crops and plants in unstructured agricultural environments remains challenging. Recently, a promising development has emerged in the form of Neural Radiance Field (NeRF), a novel method that utilises neural density fields. This technique has shown impressive performance in various novel vision synthesis tasks, but has remained relatively unexplored in the agricultural context. In our study, we focus on two fundamental tasks within plant phenotyping: (1) the synthesis of 2D novel-view images and (2) the 3D reconstruction of crop and plant models. We explore the world of neural radiance fields, in particular two SOTA methods: Instant-NGP, which excels in generating high-quality images with impressive training and inference speed, and Instant-NSR, which improves the reconstructed geometry by incorporating the Signed Distance Function (SDF) during training. In particular, we present a novel plant phenotype dataset comprising real plant images from production environments. This dataset is a first-of-its-kind initiative aimed at comprehensively exploring the advantages and limitations of NeRF in agricultural contexts. Our experimental results show that NeRF demonstrates commendable performance in the synthesis of novel-view images and is able to achieve reconstruction results that are competitive with Reality Capture, a leading commercial software for 3D Multi-View Stereo (MVS)-based reconstruction. However, our study also highlights certain drawbacks of NeRF, including relatively slow training speeds, performance limitations in cases of insufficient sampling, and challenges in obtaining geometry quality in complex setups.

CVAug 4, 2022
Semantic Segmentation of Fruits on Multi-sensor Fused Data in Natural Orchards

Hanwen Kang, Xing Wang

Semantic segmentation is a fundamental task for agricultural robots to understand the surrounding environments in natural orchards. The recent development of the LiDAR techniques enables the robot to acquire accurate range measurements of the view in the unstructured orchards. Compared to RGB images, 3D point clouds have geometrical properties. By combining the LiDAR and camera, rich information on geometries and textures can be obtained. In this work, we propose a deep-learning-based segmentation method to perform accurate semantic segmentation on fused data from a LiDAR-Camera visual sensor. Two critical problems are explored and solved in this work. The first one is how to efficiently fused the texture and geometrical features from multi-sensor data. The second one is how to efficiently train the 3D segmentation network under severely imbalance class conditions. Moreover, an implementation of 3D segmentation in orchards including LiDAR-Camera data fusion, data collection and labelling, network training, and model inference is introduced in detail. In the experiment, we comprehensively analyze the network setup when dealing with highly unstructured and noisy point clouds acquired from an apple orchard. Overall, our proposed method achieves 86.2% mIoU on the segmentation of fruits on the high-resolution point cloud (100k-200k points). The experiment results show that the proposed method can perform accurate segmentation in real orchard environments.

57.1ROApr 3
OMNI-PoseX: A Fast Vision Model for 6D Object Pose Estimation in Embodied Tasks

Michael Zhang, Wei Ying, Fangwen Chen et al.

Accurate 6D object pose estimation is a fundamental capability for embodied agents, yet remains highly challenging in open-world environments. Many existing methods often rely on closed-set assumptions or geometry-agnostic regression schemes, limiting their generalization, stability, and real-time applicability in robotic systems. We present OMNI-PoseX, a vision foundation model that introduces a novel network architecture unifying open-vocabulary perception with an SO(3)-aware reflected flow matching pose predictor. The architecture decouples object-level understanding from geometry-consistent rotation inference, and employs a lightweight multi-modal fusion strategy that conditions rotation-sensitive geometric features on compact semantic embeddings, enabling efficient and stable 6D pose estimation. To enhance robustness and generalization, the model is trained on large-scale 6D pose datasets, leveraging broad object diversity, viewpoint variation, and scene complexity to build a scalable open-world pose backbone. Comprehensive evaluations across benchmark pose estimation, ablation studies, zero-shot generalization, and system-level robotic grasping integration demonstrate the effectiveness of OMNI-PoseX. The OMNI-PoseX achieves SOTA pose accuracy and real-time efficiency, while delivering geometrically consistent predictions that enable reliable grasping of diverse, previously unseen objects.

CVMar 24, 2024
Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields

Junhong Zhao, Wei Ying, Yaoqiang Pan et al.

Accurate collection of plant phenotyping is critical to optimising sustainable farming practices in precision agriculture. Traditional phenotyping in controlled laboratory environments, while valuable, falls short in understanding plant growth under real-world conditions. Emerging sensor and digital technologies offer a promising approach for direct phenotyping of plants in farm environments. This study investigates a learning-based phenotyping method using the Neural Radiance Field to achieve accurate in-situ phenotyping of pepper plants in greenhouse environments. To quantitatively evaluate the performance of this method, traditional point cloud registration on 3D scanning data is implemented for comparison. Experimental result shows that NeRF(Neural Radiance Fields) achieves competitive accuracy compared to the 3D scanning methods. The mean distance error between the scanner-based method and the NeRF-based method is 0.865mm. This study shows that the learning-based NeRF method achieves similar accuracy to 3D scanning-based methods but with improved scalability and robustness.

ROMar 30, 2024
Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

Gengming Zhang, Hao Cao, Kewei Hu et al.

Accurately identifying lychee-picking points in unstructured orchard environments and obtaining their coordinate locations is critical to the success of lychee-picking robots. However, traditional two-dimensional (2D) image-based object detection methods often struggle due to the complex geometric structures of branches, leaves and fruits, leading to incorrect determination of lychee picking points. In this study, we propose a Fcaf3d-lychee network model specifically designed for the accurate localisation of lychee picking points. Point cloud data of lychee picking points in natural environments are acquired using Microsoft's Azure Kinect DK time-of-flight (TOF) camera through multi-view stitching. We augment the Fully Convolutional Anchor-Free 3D Object Detection (Fcaf3d) model with a squeeze-and-excitation(SE) module, which exploits human visual attention mechanisms for improved feature extraction of lychee picking points. The trained network model is evaluated on a test set of lychee-picking locations and achieves an impressive F1 score of 88.57%, significantly outperforming existing models. Subsequent three-dimensional (3D) position detection of picking points in real lychee orchard environments yields high accuracy, even under varying degrees of occlusion. Localisation errors of lychee picking points are within 1.5 cm in all directions, demonstrating the robustness and generality of the model.

RODec 28, 2021
Soft Robotic Finger with Variable Effective Length enabled by an Antagonistic Constraint Mechanism

Xing Wang, Hanwen Kang, Hongyu Zhou et al.

Compared to traditional rigid robotics, soft robotics has attracted increasing attention due to its advantages as compliance, safety, and low cost. As an essential part of soft robotics, the soft robotic gripper also shows its superior while grasping the objects with irregular shapes. Recent research has been conducted to improve its grasping performance by adjusting the variable effective length (VEL). However, the VEL achieved by multi-chamber design or tunable stiffness shape memory material requires complex pneumatic circuit design or a time-consuming phase-changing process. This work proposes a fold-based soft robotic actuator made from 3D printed filament, NinjaFlex. It is experimentally tested and represented by the hyperelastic model. Mathematic and finite element modelling is conducted to study the bending behaviour of the proposed soft actuator. Besides, an antagonistic constraint mechanism is proposed to achieve the VEL, and the experiments demonstrate that better conformity is achieved. Finally, a two-mode gripper is designed and evaluated to demonstrate the advances of VEL on grasping performance.

RODec 8, 2021
Geometry-Aware Fruit Grasping Estimation for Robotic Harvesting in Orchards

Hanwen Kang, Xing Wang, Chao Chen

Field robotic harvesting is a promising technique in recent development of agricultural industry. It is vital for robots to recognise and localise fruits before the harvesting in natural orchards. However, the workspace of harvesting robots in orchards is complex: many fruits are occluded by branches and leaves. It is important to estimate a proper grasping pose for each fruit before performing the manipulation. In this study, a geometry-aware network, A3N, is proposed to perform end-to-end instance segmentation and grasping estimation using both color and geometry sensory data from a RGB-D camera. Besides, workspace geometry modelling is applied to assist the robotic manipulation. Moreover, we implement a global-to-local scanning strategy, which enables robots to accurately recognise and retrieve fruits in field environments with two consumer-level RGB-D cameras. We also evaluate the accuracy and robustness of proposed network comprehensively in experiments. The experimental results show that A3N achieves 0.873 on instance segmentation accuracy, with an average computation time of 35 ms. The average accuracy of grasping estimation is 0.61 cm and 4.8$^{\circ}$ in centre and orientation, respectively. Overall, the robotic system that utilizes the global-to-local scanning and A3N, achieves success rate of harvesting ranging from 70\% - 85\% in field harvesting experiments.

ROOct 18, 2021
A Tactile-enabled Grasping Method for Robotic Fruit Harvesting

Hongyu Zhou, Xing Wang, Hanwen Kang et al.

In the robotic crop harvesting environment, foreign objects intrusion in the gripper workspace is frequently occurring and unignorable, however, rarely addressed. This paper presents a novel intelligent robotic grasping method capable of handling obstacle interference, which is the first of its kind in the literature. The proposed method combines deep learning algorithms with low-cost tactile sensing hardware on a multi-DoF soft robotic gripper. Through experimental validations, the proposed method demonstrated promising performance in distinguishing various grasping scenarios. The 4-finger independently controlled gripper presented outstanding adaptability to handle various picking scenarios. The overall performance of this work indicated great potential for solving the robotic fruit harvesting challenges.

CVMar 30, 2020
Real-Time Fruit Recognition and Grasping Estimation for Autonomous Apple Harvesting

Hanwen Kang, Chao Chen

In this research, a fully neural network based visual perception framework for autonomous apple harvesting is proposed. The proposed framework includes a multi-function neural network for fruit recognition and a Pointnet grasp estimation to determine the proper grasp pose to guide the robotic execution. Fruit recognition takes raw input of RGB images from the RGB-D camera to perform fruit detection and instance segmentation, and Pointnet grasp estimation take point cloud of each fruit as input and output the prediction of grasp pose for each of fruits. The proposed framework is validated by using RGB-D images collected from laboratory and orchard environments, a robotic grasping test in a controlled environment is also included in the experiments. Experimental shows that the proposed framework can accurately localise and estimate the grasp pose for robotic grasping.

CVDec 29, 2019
Visual Perception and Modelling in Unstructured Orchard for Apple Harvesting Robots

Hanwen Kang, Chao Chen

Vision perception and modelling are the essential tasks of robotic harvesting in the unstructured orchard. This paper develops a framework of visual perception and modelling for robotic harvesting of fruits in the orchard environments. The developed framework includes visual perception, scenarios mapping, and fruit modelling. The Visual perception module utilises a deep-learning model to perform multi-purpose visual perception task within the working scenarios; The scenarios mapping module applies OctoMap to represent the multiple classes of objects or elements within the environment; The fruit modelling module estimates the geometry property of objects and estimates the proper access pose of each fruit. The developed framework is implemented and evaluated in the apple orchards. The experiment results show that visual perception and modelling algorithm can accurately detect and localise the fruits, and modelling working scenarios in real orchard environments. The $F_{1}$ score and mean intersection of union of visual perception module on fruit detection and segmentation are 0.833 and 0.852, respectively. The accuracy of the fruit modelling in terms of centre localisation and pose estimation are 0.955 and 0.923, respectively. Overall, an accurate visual perception and modelling algorithm are presented in this paper.

CVNov 28, 2019
Fruit Detection, Segmentation and 3D Visualisation of Environments in Apple Orchards

Hanwen Kang, Chao Chen

Robotic harvesting of fruits in orchards is a challenging task, since high density and overlapping of fruits and branches can heavily impact the success rate of robotic harvesting. Therefore, the vision system is demanded to provide comprehensive information of the working environment to guide the manipulator and gripping system to successful detach the target fruits. In this study, a deep learning based one-stage detector DaSNet-V2 is developed to perform the multi-task vision sensing in the working environment of apple orchards. DaSNet-V2 combines the detection and instance segmentation of fruits and semantic segmentation of branch into a single network architecture. Meanwhile, a light-weight backbone network LW-net is utilised in the DaSNet-V2 model to improve the computational efficiency of the model. In the experiment, DaSNet-V2 is tested and evaluated on the RGB-D images of the orchard. From the experiment results, DaSNet-V2 with lightweight backbone achieves 0.844, 0.858, and 0.795 on the F 1 score of the detection, and mean intersection of union on the instance segmentation of fruits and semantic segmentation of branches, respectively. To provide a direct-viewing of the working environment in orchards, the obtained sensing results are illustrated by 3D visualisation . The robustness and efficiency of the DaSNet-V2 in detection and segmentation are validated by the experiments in the real-environment of apple orchard.