Daniel Morris

h-index33

16papers

1,069citations

Novelty40%

AI Score35

Ranked #104,191 of 194,257 authors (top 54%)#34,917 in CV (top 59%)

16 Papers

10.4CVApr 30, 2023

TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection

Su Pang, Daniel Morris, Hayder Radha

Despite radar's popularity in the automotive industry, for fusion-based 3D object detection, most existing works focus on LiDAR and camera fusion. In this paper, we propose TransCAR, a Transformer-based Camera-And-Radar fusion solution for 3D object detection. Our TransCAR consists of two modules. The first module learns 2D features from surround-view camera images and then uses a sparse set of 3D object queries to index into these 2D features. The vision-updated queries then interact with each other via transformer self-attention layer. The second module learns radar features from multiple radar scans and then applies transformer decoder to learn the interactions between radar features and vision-updated queries. The cross-attention layer within the transformer decoder can adaptively learn the soft-association between the radar features and vision-updated queries instead of hard-association based on sensor calibration only. Finally, our model estimates a bounding box per query using set-to-set Hungarian loss, which enables the method to avoid non-maximum suppression. TransCAR improves the velocity estimation using the radar scans without temporal information. The superior experimental results of our TransCAR on the challenging nuScenes datasets illustrate that our TransCAR outperforms state-of-the-art Camera-Radar fusion-based 3D object detection approaches.

5.9CVNov 15, 2023

Self-Annotated 3D Geometric Learning for Smeared Points Removal

Miaowei Wang, Daniel Morris

There has been significant progress in improving the accuracy and quality of consumer-level dense depth sensors. Nevertheless, there remains a common depth pixel artifact which we call smeared points. These are points not on any 3D surface and typically occur as interpolations between foreground and background objects. As they cause fictitious surfaces, these points have the potential to harm applications dependent on the depth maps. Statistical outlier removal methods fare poorly in removing these points as they tend also to remove actual surface points. Trained network-based point removal faces difficulty in obtaining sufficient annotated data. To address this, we propose a fully self-annotated method to train a smeared point removal classifier. Our approach relies on gathering 3D geometric evidence from multiple perspectives to automatically detect and annotate smeared points and valid points. To validate the effectiveness of our method, we present a new benchmark dataset: the Real Azure-Kinect dataset. Experimental results and ablation studies show that our method outperforms traditional filters and other self-annotated methods. Our work is publicly available at https://github.com/wangmiaowei/wacv2024_smearedremover.git.

14.4CVApr 12, 2025Code

RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.

Radar hits reflect from points on both the boundary and internal to object outlines. This results in a complex distribution of radar hits that depends on factors including object category, size, and orientation. Current radar-camera fusion methods implicitly account for this with a black-box neural network. In this paper, we explicitly utilize a radar hit distribution model to assist fusion. First, we build a model to predict radar hit distributions conditioned on object properties obtained from a monocular detector. Second, we use the predicted distribution as a kernel to match actual measured radar points in the neighborhood of the monocular detections, generating matching scores at nearby positions. Finally, a fusion stage combines context with the kernel detector to refine the matching scores. Our method achieves the state-of-the-art radar-camera detection performance on nuScenes. Our source code is available at https://github.com/longyunf/riccardo.

3.7CVJun 15, 2024Code

Public Computer Vision Datasets for Precision Livestock Farming: A Systematic Survey

Anil Bhujel, Yibin Wang, Yuzhen Lu et al.

Technology-driven precision livestock farming (PLF) empowers practitioners to monitor and analyze animal growth and health conditions for improved productivity and welfare. Computer vision (CV) is indispensable in PLF by using cameras and computer algorithms to supplement or supersede manual efforts for livestock data acquisition. Data availability is crucial for developing innovative monitoring and analysis systems through artificial intelligence-based techniques. However, data curation processes are tedious, time-consuming, and resource intensive. This study presents the first systematic survey of publicly available livestock CV datasets (https://github.com/Anil-Bhujel/Public-Computer-Vision-Dataset-A-Systematic-Survey). Among 58 public datasets identified and analyzed, encompassing different species of livestock, almost half of them are for cattle, followed by swine, poultry, and other animals. Individual animal detection and color imaging are the dominant application and imaging modality for livestock. The characteristics and baseline applications of the datasets are discussed, emphasizing the implications for animal welfare advocates. Challenges and opportunities are also discussed to inspire further efforts in developing livestock CV datasets. This study highlights that the limited quantity of high-quality annotated datasets collected from diverse environments, animals, and applications, the absence of contextual metadata, are a real bottleneck in PLF.

12.6CVMay 24, 2023Code

Label-Efficient Learning in Agriculture: A Comprehensive Review

Jiajia Li, Dong Chen, Xinda Qi et al.

The past decade has witnessed many great successes of machine learning (ML) and deep learning (DL) applications in agricultural systems, including weed control, plant disease diagnosis, agricultural robotics, and precision livestock management. Despite tremendous progresses, one downside of such ML/DL models is that they generally rely on large-scale labeled datasets for training, and the performance of such models is strongly influenced by the size and quality of available labeled data samples. In addition, collecting, processing, and labeling such large-scale datasets is extremely costly and time-consuming, partially due to the rising cost in human labor. Therefore, developing label-efficient ML/DL methods for agricultural applications has received significant interests among researchers and practitioners. In fact, there are more than 50 papers on developing and applying deep-learning-based label-efficient techniques to address various agricultural problems since 2016, which motivates the authors to provide a timely and comprehensive review of recent label-efficient ML/DL methods in agricultural applications. To this end, we first develop a principled taxonomy to organize these methods according to the degree of supervision, including weak supervision (i.e., active learning and semi-/weakly- supervised learning), and no supervision (i.e., un-/self- supervised learning), supplemented by representative state-of-the-art label-efficient ML/DL methods. In addition, a systematic review of various agricultural applications exploiting these label-efficient algorithms, such as precision agriculture, plant phenotyping, and postharvest quality assessment, is presented. Finally, we discuss the current problems and challenges, as well as future research directions. A well-classified paper list can be accessed at https://github.com/DongChen06/Label-efficient-in-Agriculture.

18.7CVJun 5, 2021Code

Radar-Camera Pixel Depth Association for Depth Completion

Yunfei Long, Daniel Morris, Xiaoming Liu et al.

While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are much wider than a typical pixel combined with a large baseline between camera and radar, which results in poor association between radar pixels and color pixel. A consequence is that depth completion methods designed for LiDAR and video fare poorly for radar and video. Here we propose a radar-to-pixel association stage which learns a mapping from radar returns to pixels. This mapping also serves to densify radar returns. Using this as a first stage, followed by a more traditional depth completion method, we are able to achieve image-guided depth completion with radar and video. We demonstrate performance superior to camera and radar alone on the nuScenes dataset. Our source code is available at https://github.com/longyunf/rc-pda.

13.0AISep 3, 2021

Multi-modal Program Inference: a Marriage of Pre-trainedLanguage Models and Component-based Synthesis

Kia Rahmani, Mohammad Raza, Sumit Gulwani et al.

Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous natural language, but struggle with generating syntactically and semantically precise code. Program synthesis techniques can generate correct code, often even from incomplete but precise specifications, such as examples, but they are unable to work with the ambiguity of natural languages. We present an approach that combines PTMs with component-based synthesis (CBS): PTMs are used to generate candidates programs from the natural language description of the task, which are then used to guide the CBS procedure to find the program that matches the precise examples-based specification. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains: the domain of regular expressions and the domain of CSS selectors. Our evaluation demonstrates the effectiveness of our domain-agnostic approach in comparison to a state-of-the-art specialized system, and the generality of our approach in providing multi-modal program synthesis from natural language and examples in different programming domains.

12.1CVAug 24, 2021

Full-Velocity Radar Returns by Radar-Camera Fusion

Yunfei Long, Daniel Morris, Xiaoming Liu et al.

A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution for the point-wise, full-velocity estimate of Doppler returns using the corresponding optical flow from camera images. Additionally, we address the association problem between radar returns and camera images with a neural network that is trained to estimate radar-camera correspondences. Experimental results on the nuScenes dataset verify the validity of the method and show significant improvements over the state-of-the-art in velocity estimation and accumulation of radar points.

18.1CVApr 6, 2021Code

Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries

Saif Imran, Xiaoming Liu, Daniel Morris

Depth completion starts from a sparse set of known depth values and estimates the unknown depths for the remaining image pixels. Most methods model this as depth interpolation and erroneously interpolate depth pixels into the empty space between spatially distinct objects, resulting in depth-smearing across occlusion boundaries. Here we propose a multi-hypothesis depth representation that explicitly models both foreground and background depths in the difficult occlusion-boundary regions. Our method can be thought of as performing twin-surface extrapolation, rather than interpolation, in these regions. Next our method fuses these extrapolated surfaces into a single depth image leveraging the image data. Key to our method is the use of an asymmetric loss function that operates on a novel twin-surface representation. This enables us to train a network to simultaneously do surface extrapolation and surface fusion. We characterize our loss function and compare with other common losses. Finally, we validate our method on three different datasets; KITTI, an outdoor real-world dataset, NYU2, indoor real-world depth dataset and Virtual KITTI, a photo-realistic synthetic dataset with dense groundtruth, and demonstrate improvement over the state of the art.

31.1CVSep 2, 2020Code

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

Su Pang, Daniel Morris, Hayder Radha

There have been significant advances in neural networks for both 3D object detection using LiDAR and 2D object detection using video. However, it has been surprisingly difficult to train networks to effectively use both modalities in a way that demonstrates gain over single-modality networks. In this paper, we propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network. CLOCs fusion provides a low-complexity multi-modal fusion framework that significantly improves the performance of single-modality detectors. CLOCs operates on the combined output candidates before Non-Maximum Suppression (NMS) of any 2D and any 3D detector, and is trained to leverage their geometric and semantic consistencies to produce more accurate final 3D and 2D detection results. Our experimental evaluation on the challenging KITTI object detection benchmark, including 3D and bird's eye view metrics, shows significant improvements, especially at long distance, over the state-of-the-art fusion based methods. At time of submission, CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard. We will release our code upon acceptance.

1.8CVMay 1, 2019

Bean Split Ratio for Dry Bean Canning Quality and Variety Analysis

Yunfei Long, Amber Bassett, Karen Cichy et al.

Splits on canned beans appear in the process of preparation and canning. Researchers are studying how they are influenced by cooking environment and genotype. However, there is no existing method to automatically quantify or to characterize the severity of splits. To solve this, we propose two measures: the Bean Split Ratio (BSR) that quantifies the overall severity of splits, and the Bean Split Histogram (BSH) that characterizes the size distribution of splits. We create a pixel-wise segmentation method to automatically estimate these measures from images. We also present a bean dataset of recombinant inbred lines of two genotypes, use the BSR and BSH to assess canning quality, and explore heritability of these properties.

20.2CVMar 13, 2019

Depth Coefficients for Depth Completion

Saif Imran, Yunfei Long, Xiaoming Liu et al.

Depth completion involves estimating a dense depth image from sparse depth measurements, often guided by a color image. While linear upsampling is straight forward, it results in artifacts including depth pixels being interpolated in empty space across discontinuities between objects. Current methods use deep networks to upsample and "complete" the missing depth pixels. Nevertheless, depth smearing between objects remains a challenge. We propose a new representation for depth called Depth Coefficients (DC) to address this problem. It enables convolutions to more easily avoid inter-object depth mixing. We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC. With quantitative and qualitative evaluation on benchmarks, we show that switching out sparse depth input and MSE loss with our DC representation and cross-entropy loss is a simple way to improve depth completion performance, and reduce pixel depth mixing, which leads to improved depth-based object detection.

7.8CVApr 5, 2018

A Pyramid CNN for Dense-Leaves Segmentation

Daniel D. Morris

Automatic detection and segmentation of overlapping leaves in dense foliage can be a difficult task, particularly for leaves with strong textures and high occlusions. We present Dense-Leaves, an image dataset with ground truth segmentation labels that can be used to train and quantify algorithms for leaf segmentation in the wild. We also propose a pyramid convolutional neural network with multi-scale predictions that detects and discriminates leaf boundaries from interior textures. Using these detected boundaries, closed-contour boundaries around individual leaves are estimated with a watershed-based algorithm. The result is an instance segmenter for dense leaves. Promising segmentation results for leaves in dense foliage are obtained.

1.7ROSep 25, 2017

A View-Dependent Adaptive Matched Filter for LADAR-Based Vehicle Tracking

Daniel D. Morris, Regis Hoffman, Paul Haley

LADARs mounted on mobile platforms produce a wealth of precise range data on the surrounding objects and vehicles. The challenge we address is to infer from these raw LADAR data the location and orientation of nearby vehicles. We propose a novel view-dependent adaptive matched filter for obtaining fast and precise measurements of target vehicle pose. We derive an analytic expression for the matching function which we optimize to obtain target pose and size. Our algorithm is fast, robust and simple to implement compared to other methods. When used as the measurement component of a tracker on an autonomous ground vehicle, we are able to track in excess of 50 targets at 10 Hz. Once targets are aligned using our matched filter, we use a support vector-based discriminator to distinguish vehicles from other objects. This tracker provides a key sensing component for our autonomous ground vehicles which have accumulated hundreds of miles of on-road and off-road autonomous driving.

5.6ROSep 25, 2017

LADAR-Based Vehicle Tracking and Trajectory Estimation for Urban Driving

Daniel Morris, Paul Haley, William Zachar et al.

Safe mobility for unmanned ground vehicles requires reliable detection of other vehicles, along with precise estimates of their locations and trajectories. Here we describe the algorithms and system we have developed for accurate trajectory estimation of nearby vehicles using an onboard scanning LADAR. We introduce a variable-axis Ackerman steering model and compare this to an independent steering model. Then for robust tracking we propose a multi-hypothesis tracker that combines these kinematic models to leverage the strengths of each. When trajectories estimated with our techniques are input into a planner, they enable an unmanned vehicle to negotiate traffic in urban environments. Results have been evaluated running in real time on a moving vehicle with a scanning LADAR.

4.5ROSep 25, 2017

LADAR-Based Mover Detection from Moving Vehicles

Daniel D. Morris, Brian Colonna, Paul Haley

Detecting moving vehicles and people is crucial for safe operation of UGVs but is challenging in cluttered, real world environments. We propose a registration technique that enables objects to be robustly matched and tracked, and hence movers to be detected even in high clutter. Range data are acquired using a 2D scanning Ladar from a moving platform. These are automatically clustered into objects and modeled using a surface density function. A Bhattacharya similarity is optimized to register subsequent views of each object enabling good discrimination and tracking, and hence mover detection.