RODec 3, 2022
Autonomous Apple Fruitlet Sizing and Growth Rate Tracking using Computer VisionHarry Freeman, Mohamad Qadri, Abhisesh Silwal et al.
In this paper, we present a computer vision-based approach to measure the sizes and growth rates of apple fruitlets. Measuring the growth rates of apple fruitlets is important because it allows apple growers to determine when to apply chemical thinners to their crops in order to optimize yield. The current practice of obtaining growth rates involves using calipers to record sizes of fruitlets across multiple days. Due to the number of fruitlets needed to be sized, this method is laborious, time-consuming, and prone to human error. With images collected by a hand-held stereo camera, our system, segments, clusters, and fits ellipses to fruitlets to measure their diameters. The growth rates are then calculated by temporally associating clustered fruitlets across days. We provide quantitative results on data collected in an apple orchard, and demonstrate that our system is able to predict abscise rates within 3.5% of the current method with a 6 times improvement in speed, while requiring significantly less manual effort. Moreover, we provide results on images captured by a robotic system in the field, and discuss the next steps required to make the process fully autonomous.
RONov 14, 2022
3D Reconstruction-Based Seed Counting of Sorghum Panicles for Agricultural InspectionHarry Freeman, Eric Schneider, Chung Hee Kim et al.
In this paper, we present a method for creating high-quality 3D models of sorghum panicles for phenotyping in breeding experiments. This is achieved with a novel reconstruction approach that uses seeds as semantic landmarks in both 2D and 3D. To evaluate the performance, we develop a new metric for assessing the quality of reconstructed point clouds without having a ground-truth point cloud. Finally, a counting method is presented where the density of seed centers in the 3D model allows 2D counts from multiple views to be effectively combined into a whole-panicle count. We demonstrate that using this method to estimate seed count and weight for sorghum outperforms count extrapolation from 2D images, an approach used in most state of the art methods for seeds and grains of comparable size.
69.8ROApr 12
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human DemonstrationsHarry Freeman, Chung Hee Kim, George Kantor
Recent advancements in learning from human demonstration have shown promising results in addressing the scalability and high cost of data collection required to train robust visuomotor policies. However, existing approaches are often constrained by a reliance on multiview camera setups, depth sensors, or custom hardware and are typically limited to policy execution from third-person or egocentric cameras. In this paper, we present WARPED, a framework designed to synthesize realistic wrist-view observations from human demonstration videos to facilitate the training of visuomotor policies using only monocular RGB data. With data collected from an egocentric RGB camera, our system leverages vision foundation models to initialize the interactive scene. A hand-object interaction pipeline is then employed to track the hand and manipulated object and retarget the trajectories to a robotic end-effector. Lastly, photo-realistic wrist-view observations are synthesized via Gaussian Splatting to directly train a robotic policy. We demonstrate that WARPED achieves success rates comparable to policies trained on teleoperated demonstration data for five tabletop manipulation tasks, while requiring 5-8x less data collection time.
CVMar 5, 2025
Transformer-Based Spatio-Temporal Association of Apple FruitletsHarry Freeman, George Kantor
In this paper, we present a transformer-based method to spatio-temporally associate apple fruitlets in stereo-images collected on different days and from different camera poses. State-of-the-art association methods in agriculture are dedicated towards matching larger crops using either high-resolution point clouds or temporally stable features, which are both difficult to obtain for smaller fruit in the field. To address these challenges, we propose a transformer-based architecture that encodes the shape and position of each fruitlet, and propagates and refines these features through a series of transformer encoder layers with alternating self and cross-attention. We demonstrate that our method is able to achieve an F1-score of 92.4% on data collected in a commercial apple orchard and outperforms all baselines and ablations.
ROAug 9, 2021
3D Human Reconstruction in the Wild with Collaborative Aerial CamerasCherie Ho, Andrew Jong, Harry Freeman et al.
Aerial vehicles are revolutionizing applications that require capturing the 3D structure of dynamic targets in the wild, such as sports, medicine, and entertainment. The core challenges in developing a motion-capture system that operates in outdoors environments are: (1) 3D inference requires multiple simultaneous viewpoints of the target, (2) occlusion caused by obstacles is frequent when tracking moving targets, and (3) the camera and vehicle state estimation is noisy. We present a real-time aerial system for multi-camera control that can reconstruct human motions in natural environments without the use of special-purpose markers. We develop a multi-robot coordination scheme that maintains the optimal flight formation for target reconstruction quality amongst obstacles. We provide studies evaluating system performance in simulation, and validate real-world performance using two drones while a target performs activities such as jogging and playing soccer. Supplementary video: https://youtu.be/jxt91vx0cns