Chris McCool

h-index37

28papers

5,258citations

Novelty45%

AI Score32

Ranked #128,009 of 194,257 authors (top 66%)#3,804 in RO (top 56%)

28 Papers

15.5ROMar 15, 2023Code

Panoptic Mapping with Fruit Completion and Pose Estimation for Horticultural Robots

Yue Pan, Federico Magistri, Thomas Läbe et al.

Monitoring plants and fruits at high resolution play a key role in the future of agriculture. Accurate 3D information can pave the way to a diverse number of robotic applications in agriculture ranging from autonomous harvesting to precise yield estimation. Obtaining such 3D information is non-trivial as agricultural environments are often repetitive and cluttered, and one has to account for the partial observability of fruit and plants. In this paper, we address the problem of jointly estimating complete 3D shapes of fruit and their pose in a 3D multi-resolution map built by a mobile robot. To this end, we propose an online multi-resolution panoptic mapping system where regions of interest are represented with a higher resolution. We exploit data to learn a general fruit shape representation that we use at inference time together with an occlusion-aware differentiable rendering pipeline to complete partial fruit observations and estimate the 7 DoF pose of each fruit in the map. The experiments presented in this paper evaluated both in the controlled environment and in a commercial greenhouse, show that our novel algorithm yields higher completion and pose estimation accuracy than existing methods, with an improvement of 41% in completion accuracy and 52% in pose estimation accuracy while keeping a low inference time of 0.6s in average. Codes are available at: https://github.com/PRBonn/HortiMapping.

15.5ROSep 11, 2023

PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics

Claus Smitt, Michael Halstead, Patrick Zimmer et al.

Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, our system is able to output scene geometry, photo-realistic renders and 3D consistent panoptic representations with consistent instance IDs. We evaluate this novel system in a very challenging horticultural scenario and in doing so demonstrate an end-to-end trainable system that can make use of noisy robot poses rather than precise poses that have to be pre-calculated. Compared to a baseline approach the peak signal to noise ratio is improved from 21.34dB to 23.37dB while the panoptic quality improves from 56.65% to 70.08%. Furthermore, our approach is faster and can be tuned to improve inference time by more than a factor of 2 while being memory efficient with approximately 12 times fewer parameters.

5.2CVJul 18, 2024Code

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Federico Magistri, Thomas Läbe, Elias Marks et al.

As the world population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7,000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera intrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation of shape completion approaches on a hidden test set through a public challenge on a benchmark server.

13.2ROJul 8

GemNav: Discrete-Token Visual Robot Navigation using a Multimodal Large Language Model

Peter Bohm, Saimunur Rahman, Abdelwahed Khamis et al.

Visual navigation policies built on large pretrained models have so far followed a common recipe: a dedicated visual encoder, a bespoke action head, and training on thousands of hours of cross-embodiment datasets. We ask whether this recipe is necessary. In this paper, we introduce GemNav, a visual robot navigation policy that adapts a frozen Multimodal Large Language Model (MLLM) for short-to-medium horizon waypoint navigation using Low-Rank Adaptation (LoRA) on the language tower alone, with no auxiliary visual encoder and no continuous regression head. Waypoints and categorical navigation signals share a single discrete token vocabulary generated by the language-model head, and a soft-decoded auxiliary loss recovers the metric structure that pure cross-entropy training discards. On a single 8.7-hour open corpus, roughly three orders of magnitude smaller than competing training sets, the policy transfers zero-shot to four physically distinct unseen environments and stops within 0.25-0.42m of the goal across 20 real-world trials covering an open carpark, an obstacle carpark, a long outdoor chemical yard, and an indoor warehouse. Conditioning on short image histories improves offline metrics but yields no robot benefit, pointing to a ceiling on what temporal context adds once pretrained vision features are in place. These results indicate that discrete-token adaptation of frozen MLLMs can provide a data-efficient, deployable alternative for foundation model robot navigation.

6.9ROJun 27, 2022Code

Explicitly incorporating spatial information to recurrent networks for agriculture

Claus Smitt, Michael Halstead, Alireza Ahmadi et al.

In agriculture, the majority of vision systems perform still image classification. Yet, recent work has highlighted the potential of spatial and temporal cues as a rich source of information to improve the classification performance. In this paper, we propose novel approaches to explicitly capture both spatial and temporal information to improve the classification of deep convolutional neural networks. We leverage available RGB-D images and robot odometry to perform inter-frame feature map spatial registration. This information is then fused within recurrent deep learnt models, to improve their accuracy and robustness. We demonstrate that this can considerably improve the classification performance with our best performing spatial-temporal model (ST-Atte) achieving absolute performance improvements for intersection-over-union (IoU[%]) of 4.7 for crop-weed segmentation and 2.6 for fruit (sweet pepper) segmentation. Furthermore, we show that these approaches are robust to variable framerates and odometry errors, which are frequently observed in real-world applications.

1.5CVMar 15, 2023

Panoptic One-Click Segmentation: Applied to Agricultural Data

Patrick Zimmer, Michael Halstead, Chris McCool

In weed control, precision agriculture can help to greatly reduce the use of herbicides, resulting in both economical and ecological benefits. A key element is the ability to locate and segment all the plants from image data. Modern instance segmentation techniques can achieve this, however, training such systems requires large amounts of hand-labelled data which is expensive and laborious to obtain. Weakly supervised training can help to greatly reduce labelling efforts and costs. We propose panoptic one-click segmentation, an efficient and accurate offline tool to produce pseudo-labels from click inputs which reduces labelling effort. Our approach jointly estimates the pixel-wise location of all N objects in the scene, compared to traditional approaches which iterate independently through all N objects; this greatly reduces training time. Using just 10% of the data to train our panoptic one-click segmentation approach yields 68.1% and 68.8% mean object intersection over union (IoU) on challenging sugar beet and corn image data respectively, providing comparable performance to traditional one-click approaches while being approximately 12 times faster to train. We demonstrate the applicability of our system by generating pseudo-labels from clicks on the remaining 90% of the data. These pseudo-labels are then used to train Mask R-CNN, in a semi-supervised manner, improving the absolute performance (of mean foreground IoU) by 9.4 and 7.9 points for sugar beet and corn data respectively. Finally, we show that our technique can recover missed clicks during annotation outlining a further benefit over traditional approaches.

7.9CVJul 16

Still image and spatial-temporal tomato data enabling detection, segmentation, tracking, and video-instance segmentation using strong and weak labels

Michael Halstead, Esra Guclu, Mohamed Farag et al.

In this manuscript we release two datasets for visual sensing of tomato plants grown in commercial-like settings and acquired using a robot. The first is BUTom21 which consists of still images and manual annotations. The second is BUTom-ST21 which consists of video-based data and semi-automated annotations through AI-based methods, referred to as pseudo-labels. In both cases, we provide pixel-level labels for the ripeness of the fruit. The aim is to provide the research community a challenging set of real-world imagery to explore methods to sense and estimate the state of tomato plants and their fruit, which is an important horticultural crop. Importantly, the spatial-temporal dataset provides individual fruit count and ripeness information enabling researchers to push the boundaries of field-based phenotyping.

4.1ROMay 15, 2024

BonnBot-I Plus: A Bio-diversity Aware Precise Weed Management Robotic Platform

Alireza Ahmadi, Michael Halstead, Claus Smitt et al.

In this article, we focus on the critical tasks of plant protection in arable farms, addressing a modern challenge in agriculture: integrating ecological considerations into the operational strategy of precision weeding robots like \bbot. This article presents the recent advancements in weed management algorithms and the real-world performance of \bbot\ at the University of Bonn's Klein-Altendorf campus. We present a novel Rolling-view observation model for the BonnBot-Is weed monitoring section which leads to an average absolute weeding performance enhancement of $3.4\%$. Furthermore, for the first time, we show how precision weeding robots could consider bio-diversity-aware concerns in challenging weeding scenarios. We carried out comprehensive weeding experiments in sugar-beet fields, covering both weed-only and mixed crop-weed situations, and introduced a new dataset compatible with precision weeding. Our real-field experiments revealed that our weeding approach is capable of handling diverse weed distributions, with a minimal loss of only $11.66\%$ attributable to intervention planning and $14.7\%$ to vision system limitations highlighting required improvements of the vision system.

10.4ROSep 24, 2021Code

Towards Autonomous Visual Navigation in Arable Fields

Alireza Ahmadi, Michael Halstead, Chris McCool

Autonomous navigation of a robot in agricultural fields is essential for every task from crop monitoring to weed management and fertilizer application. Many current approaches rely on accurate GPS, however, such technology is expensive and also prone to failure (e.g. through lack of coverage). As such, autonomous navigation through sensors that can interpret their environment (such as cameras) is important to achieve the goal of autonomy in agriculture. In this paper, we introduce a purely vision-based navigation scheme that is able to reliably guide the robot through row-crop fields without manual intervention. Independent of any global localization or mapping, this approach is able to accurately follow the crop-rows and switch between the rows, only using onboard cameras. With the help of a novel crop-row detection and a novel crop-row switching technique, our navigation scheme can be deployed in a wide range of fields with different canopy types in various growth stages with limited parameter tuning, creating a crop agnostic navigation approach. We have extensively evaluated our approach in three different fields under various illumination conditions using our agricultural robotic platform (BonnBot-I). For navigation, our approach is evaluated on five crop types and achieves an average navigation accuracy of 3.82cm relative to manual teleoperation.

8.9ROAug 18, 2021Code

Combining Local and Global Viewpoint Planning for Fruit Coverage

Tobias Zaenker, Chris Lehnert, Chris McCool et al.

Obtaining 3D sensor data of complete plants or plant parts (e.g., the crop or fruit) is difficult due to their complex structure and a high degree of occlusion. However, especially for the estimation of the position and size of fruits, it is necessary to avoid occlusions as much as possible and acquire sensor information of the relevant parts. Global viewpoint planners exist that suggest a series of viewpoints to cover the regions of interest up to a certain degree, but they usually prioritize global coverage and do not emphasize the avoidance of local occlusions. On the other hand, there are approaches that aim at avoiding local occlusions, but they cannot be used in larger environments since they only reach a local maximum of coverage. In this paper, we therefore propose to combine a local, gradient-based method with global viewpoint planning to enable local occlusion avoidance while still being able to cover large areas. Our simulated experiments with a robotic arm equipped with a camera array as well as an RGB-D camera show that this combination leads to a significantly increased coverage of the regions of interest compared to just applying global coverage planning.

5.6CVJun 18, 2021

Virtual Temporal Samples for Recurrent Neural Networks: applied to semantic segmentation in agriculture

Alireza Ahmadi, Michael Halstead, Chris McCool

This paper explores the potential for performing temporal semantic segmentation in the context of agricultural robotics without temporally labelled data. We achieve this by proposing to generate virtual temporal samples from labelled still images. By exploiting the relatively static scene and assuming that the robot (camera) moves we are able to generate virtually labelled temporal sequences with no extra annotation effort. Normally, to train a recurrent neural network (RNN), labelled samples from a video (temporal) sequence are required which is laborious and has stymied work in this direction. By generating virtual temporal samples, we demonstrate that it is possible to train a lightweight RNN to perform semantic segmentation on two challenging agricultural datasets. Our results show that by training a temporal semantic segmenter using virtual samples we can increase the performance by an absolute amount of $4.6$ and $4.9$ on sweet pepper and sugar beet datasets, respectively. This indicates that our virtual data augmentation technique is able to accurately classify agricultural images temporally without the use of complicated synthetic data generation techniques nor with the overhead of labelling large amounts of temporal sequences.

15.1ROOct 31, 2020Code

Viewpoint Planning for Fruit Size and Position Estimation

Tobias Zaenker, Claus Smitt, Chris McCool et al.

Modern agricultural applications require knowledge about the position and size of fruits on plants. However, occlusions from leaves typically make obtaining this information difficult. We present a novel viewpoint planning approach that builds up an octree of plants with labeled regions of interest (ROIs), i.e., fruits. Our method uses this octree to sample viewpoint candidates that increase the information around the fruit regions and evaluates them using a heuristic utility function that takes into account the expected information gain. Our system automatically switches between ROI targeted sampling and exploration sampling, which considers general frontier voxels, depending on the estimated utility. When the plants have been sufficiently covered with the RGB-D sensor, our system clusters the ROI voxels and estimates the position and size of the detected fruits. We evaluated our approach in simulated scenarios and compared the resulting fruit estimations with the ground truth. The results demonstrate that our combined approach outperforms a sampling method that does not explicitly consider the ROIs to generate viewpoints in terms of the number of discovered ROI cells. Furthermore, we show the real-world applicability by testing our framework on a robotic arm equipped with an RGB-D camera installed on an automated pipe-rail trolley in a capsicum glasshouse.

17.3ROOct 30, 2020

PATHoBot: A Robot for Glasshouse Crop Phenotyping and Intervention

Claus Smitt, Michael Halstead, Tobias Zaenker et al.

We present PATHoBot an autonomous crop surveying and intervention robot for glasshouse environments. The aim of this platform is to autonomously gather high quality data and also estimate key phenotypic parameters. To achieve this we retro-fit an off-the-shelf pipe-rail trolley with an array of multi-modal cameras, navigation sensors and a robotic arm for close surveying tasks and intervention. In this paper we describe PATHoBot design choices made to ensure proper operation in a commercial glasshouse environment. As a surveying platform we collect a number of datasets which include both sweet pepper and tomatoes. We show how PATHoBot enables novel surveillance approaches by first improving our previous work on fruit counting by incorporating wheel odometry and depth information. We find that by introducing re-projection and depth information we are able to achieve an absolute improvement of 20 points over the baseline technique in an "in the wild" situation. Finally, we present a 3D mapping case study, further showcasing PATHoBot's crop surveying capabilities.

7.2ROOct 29, 2018

A Sweet Pepper Harvesting Robot for Protected Cropping Environments

Chris Lehnert, Chris McCool, Inkyu Sa et al.

Using robots to harvest sweet peppers in protected cropping environments has remained unsolved despite considerable effort by the research community over several decades. In this paper, we present the robotic harvester, Harvey, designed for sweet peppers in protected cropping environments that achieved a 76.5% success rate (within a modified scenario) which improves upon our prior work which achieved 58% and related sweet pepper harvesting work which achieved 33\%. This improvement was primarily achieved through the introduction of a novel peduncle segmentation system using an efficient deep convolutional neural network, in conjunction with 3D post-filtering to detect the critical cutting location. We benchmark the peduncle segmentation against prior art demonstrating a considerable improvement in performance with an F_1 score of 0.564 compared to 0.302. The robotic harvester uses a perception pipeline to detect a target sweet pepper and an appropriate grasp and cutting pose used to determine the trajectory of a multi-modal harvesting tool to grasp the sweet pepper and cut it from the plant. A novel decoupling mechanism enables the gripping and cutting operations to be performed independently. We perform an in-depth analysis of the full robotic harvesting system to highlight bottlenecks and failure points that future work could address.

8.0ROSep 21, 2018

3D Move to See: Multi-perspective visual servoing for improving object views with semantic segmentation

Chris Lehnert, Dorian Tsai, Anders Eriksson et al.

In this paper, we present a new approach to visual servoing for robotics, referred to as 3D Move to See (3DMTS), based on the principle of finding the next best view using a 3D camera array and a robotic manipulator to obtain multiple samples of the scene from different perspectives. The method uses semantic vision and an objective function applied to each perspective to sample a gradient representing the direction of the next best view. The method is demonstrated within simulation and on a real robotic platform containing a custom 3D camera array for the challenging scenario of robotic harvesting in a highly occluded and unstructured environment. It was shown on a real robotic platform that by moving the end effector using the gradient of an objective function leads to a locally optimal view of the object of interest, even amongst occlusions. The overall performance of the 3DMTS method obtained a mean increase in target size by 29.3% compared to a baseline method using a single RGB-D camera, which obtained 9.17%. The results demonstrate qualitatively and quantitatively that the 3DMTS method performed better in most scenarios, and yielded three times the target size compared to the baseline method. The increased target size in the final view will improve the detection of key features of the object of interest for further manipulation, such as grasping and harvesting.

3.9CVJan 25, 2018

A Rapidly Deployable Classification System using Visual Data for the Application of Precision Weed Management

David Hall, Feras Dayoub, Tristan Perez et al.

In this work we demonstrate a rapidly deployable weed classification system that uses visual data to enable autonomous precision weeding without making prior assumptions about which weed species are present in a given field. Previous work in this area relies on having prior knowledge of the weed species present in the field. This assumption cannot always hold true for every field, and thus limits the use of weed classification systems based on this assumption. In this work, we obviate this assumption and introduce a rapidly deployable approach able to operate on any field without any weed species assumptions prior to deployment. We present a three stage pipeline for the implementation of our weed classification system consisting of initial field surveillance, offline processing and selective labelling, and automated precision weeding. The key characteristic of our approach is the combination of plant clustering and selective labelling which is what enables our system to operate without prior weed species knowledge. Testing using field data we are able to label 12.3 times fewer images than traditional full labelling whilst reducing classification accuracy by only 14%.

3.3CVJan 17, 2018

Fruit Quantity and Quality Estimation using a Robotic Vision System

M. Halstead, C. McCool, S. Denman et al.

Accurate localisation of crop remains highly challenging in unstructured environments such as farms. Many of the developed systems still rely on the use of hand selected features for crop identification and often neglect the estimation of crop quantity and quality, which is key to assigning labor during farming processes. To alleviate these limitations we present a robotic vision system that can accurately estimate the quantity and quality of sweet pepper (Capsicum annuum L), a key horticultural crop. This system consists of three parts: detection, quality estimation, and tracking. Efficient detection is achieved using the FasterRCNN framework. Quality is then estimated in the same framework by learning a parallel layer which we show experimentally results in superior performance than treating quality as extra classes in the traditional Faster-RCNN framework. Evaluation of these two techniques outlines the improved performance of the parallel layer, where we achieve an F1 score of 77.3 for the parallel technique yet only 72.5 for the best scoring (red) of the multi-class implementation. To track the crop we present a tracking via detection approach, which uses the FasterRCNN with parallel layers, that is also a vision-only solution. This approach is cheap to implement as it only requires a camera and in experiments across 2 days we show that our proposed system can accurately estimate the number of sweet pepper present, within 4.1% of the ground truth.

1.7ROSep 29, 2017

In-Field Peduncle Detection of Sweet Peppers for Robotic Harvesting: a comparative study

Chris Lehnert, Chris McCool, Tristan Perez

Robotic harvesting of crops has the potential to disrupt current agricultural practices. A key element to enabling robotic harvesting is to safely remove the crop from the plant which often involves locating and cutting the peduncle, the part of the crop that attaches it to the main stem of the plant. In this paper we present a comparative study of two methods for performing peduncle detection. The first method is based on classic colour and geometric features obtained from the scene with a support vector machine classifier, referred to as PFH-SVM. The second method is an efficient deep neural network approach, MiniInception, that is able to be deployed on a robotic platform. In both cases we employ a secondary filtering process that enforces reasonable assumptions about the crop structure, such as the proximity of the peduncle to the crop. Our tests are conducted on Harvey, a sweet pepper harvesting robot, and is evaluated in a greenhouse using two varieties of sweet pepper, Ducati and Mercuno. We demonstrate that the MiniInception method achieves impressive accuracy and considerably outperforms the PFH-SVM approach achieving an F1 score of 0.564 and 0.302 respectively.

1.7ROJun 19, 2017

Lessons Learnt from Field Trials of a Robotic Sweet Pepper Harvester

Christopher Lehnert, Christopher McCool, Tristan Perez

In this paper, we present the lessons learnt during the development of a new robotic harvester (Harvey) that can autonomously harvest sweet pepper (capsicum) in protected cropping environments. Robotic harvesting offers an attractive potential solution to reducing labour costs while enabling more regular and selective harvesting, optimising crop quality, scheduling and therefore profit. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. We demonstrate a simple and effective vision-based algorithm for crop detection, a grasp selection method, and a novel end-effector design for harvesting. To reduce the complexity of motion planning and to minimise occlusions we focus on picking sweet peppers in a protected cropping environment where plants are grown on planar trellis structures. Initial field trials in protected cropping environments, with two cultivars, demonstrate the efficacy of this approach. The results show that the robot harvester can successfully detect, grasp, and detach crop from the plant within a real protected cropping system. The novel contributions of this work have resulted in significant and encouraging improvements in sweet pepper picking success rates compared with the state-of-the-art. Future work will look at detecting sweet pepper peduncles and improving the total harvesting cycle time for each sweet pepper. The methods presented in this paper provide steps towards the goal of fully autonomous and reliable crop picking systems that will revolutionise the horticulture industry by reducing labour costs, maximising the quality of produce, and ultimately improving the sustainability of farming enterprises.

19.2ROJun 7, 2017

Autonomous Sweet Pepper Harvesting for Protected Cropping Systems

Chris Lehnert, Andrew English, Chris McCool et al.

In this letter, we present a new robotic harvester (Harvey) that can autonomously harvest sweet pepper in protected cropping environments. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. Initial field trials in protected cropping environments, with two cultivar, demonstrate the efficacy of this approach achieving a 46% success rate for unmodified crop, and 58% for modified crop. Furthermore, for the more favourable cultivar we were also able to detach 90% of sweet peppers, indicating that improvements in the grasping success rate would result in greatly improved harvesting performance.

4.4CVFeb 4, 2017

Towards Unsupervised Weed Scouting for Agricultural Robotics

David Hall, Feras Dayoub, Jason Kulk et al.

Weed scouting is an important part of modern integrated weed management but can be time consuming and sparse when performed manually. Automated weed scouting and weed destruction has typically been performed using classification systems able to classify a set group of species known a priori. This greatly limits deployability as classification systems must be retrained for any field with a different set of weed species present within them. In order to overcome this limitation, this paper works towards developing a clustering approach to weed scouting which can be utilized in any field without the need for prior species knowledge. We demonstrate our system using challenging data collected in the field from an agricultural robotics platform. We show that considerable improvements can be made by (i) learning low-dimensional (bottleneck) features using a deep convolutional neural network to represent plants in general and (ii) tying views of the same area (plant) together. Deploying this algorithm on in-field data collected by AgBotII, we are able to successfully cluster cotton plants from grasses without prior knowledge or training for the specific plants in the field.

10.1ROJan 30, 2017

Peduncle Detection of Sweet Pepper for Autonomous Crop Harvesting - Combined Colour and 3D Information

Inkyu Sa, Chris Lehnert, Andrew English et al.

This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in reliable autonomous harvesting of sweet peppers, as this can lead to precise cutting while avoiding damage to the surrounding plant. This paper makes use of both colour and geometry information acquired from an RGB-D sensor and utilises a supervised-learning approach for the peduncle detection task. The performance of the proposed method is demonstrated and evaluated using qualitative and quantitative results (the Area-Under-the-Curve (AUC) of the detection precision-recall curve). We are able to achieve an AUC of 0.71 for peduncle detection on field-grown sweet peppers. We release a set of manually annotated 3D sweet pepper and peduncle images to assist the research community in performing further research on this topic.

11.7ROSep 17, 2016Code

The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

Jürgen Leitner, Adam W. Tow, Jake E. Dean et al.

Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.

3.0CVFeb 4, 2016

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Johanna Carvajal, Chris McCool, Brian Lovell et al.

We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlapping of the temporal windows. Experiments were performed on two datasets: s-KTH (a stitched version of the KTH dataset to simulate multi-actions), and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0%, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3% and 71.2%, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9%, outperforming the GMM and HMM approaches which obtained 33.7% and 38.4%, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach.

13.2CVNov 30, 2015

Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks

ZongYuan Ge, Alex Bewley, Christopher McCool et al.

We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets.

10.8CVMay 9, 2015

Subset Feature Learning for Fine-Grained Category Classification

Zongyuan Ge, Christopher Mccool, Conrad Sanderson et al.

Fine-grained categorisation has been a challenging problem due to small inter-class variation, large intra-class variation and low number of training images. We propose a learning system which first clusters visually similar classes and then learns deep convolutional neural network features specific to each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset show that the proposed method outperforms recent fine-grained categorisation methods under the most difficult setting: no bounding boxes are presented at test time. It achieves a mean accuracy of 77.5%, compared to the previous best performance of 73.2%. We also show that progressive transfer learning allows us to first learn domain-generic features (for bird classification) which can then be adapted to specific set of bird classes, yielding improvements in accuracy.

9.7CVFeb 27, 2015

Modelling Local Deep Convolutional Neural Network Features to Improve Fine-Grained Image Classification

ZongYuan Ge, Chris McCool, Conrad Sanderson et al.

We propose a local modelling approach using deep convolutional neural networks (CNNs) for fine-grained image classification. Recently, deep CNNs trained from large datasets have considerably improved the performance of object recognition. However, to date there has been limited work using these deep CNNs as local feature extractors. This partly stems from CNNs having internal representations which are high dimensional, thereby making such representations difficult to model using stochastic models. To overcome this issue, we propose to reduce the dimensionality of one of the internal fully connected layers, in conjunction with layer-restricted retraining to avoid retraining the entire network. The distribution of low-dimensional features obtained from the modified layer is then modelled using a Gaussian mixture model. Comparative experiments show that considerable performance improvements can be achieved on the challenging Fish and UEC FOOD-100 datasets.

1.9CVMar 3, 2014

Summarisation of Short-Term and Long-Term Videos using Texture and Colour

Johanna Carvajal, Chris McCool, Conrad Sanderson

We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures (BoT) approach. Two systems are proposed, one based solely on the BoT approach and another which exploits both colour information and BoT features. On 50 short-term videos from the Open Video Project we show that our BoT and fusion systems both achieve state-of-the-art performance, obtaining an average F-measure of 0.83 and 0.86 respectively, a relative improvement of 9% and 13% when compared to the previous state-of-the-art. When applied to a new underwater surveillance dataset containing 33 long-term videos, the proposed system reduces the amount of footage by a factor of 27, with only minor degradation in the information content. This order of magnitude reduction in video data represents significant savings in terms of time and potential labour cost when manually reviewing such footage.