Ankit Laddha

CV
7papers
690citations
Novelty54%
AI Score27

7 Papers

CVApr 21, 2021
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

Ankit Laddha, Shivam Gautam, Stefan Palombo et al.

In this work, we propose \textit{MVFuseNet}, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining real-time performance.

CVOct 2, 2020
LiRaNet: End-to-End Trajectory Prediction using Spatio-Temporal Radar Fusion

Meet Shah, Zhiling Huang, Ankit Laddha et al.

In this paper, we present LiRaNet, a novel end-to-end trajectory prediction method which utilizes radar sensor information along with widely used lidar and high definition (HD) maps. Automotive radar provides rich, complementary information, allowing for longer range vehicle detection as well as instantaneous radial velocity measurements. However, there are factors that make the fusion of lidar and radar information challenging, such as the relatively low angular resolution of radar measurements, their sparsity and the lack of exact time synchronization with lidar. To overcome these challenges, we propose an efficient spatio-temporal radar feature extraction scheme which achieves state-of-the-art performance on multiple large-scale datasets.Further, by incorporating radar information, we show a 52% reduction in prediction error for objects with high acceleration and a 16% reduction in prediction error for objects at longer range.

CVMay 21, 2020
RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint 3D Object Detection and Motion Forecasting

Ankit Laddha, Shivam Gautam, Gregory P. Meyer et al.

Robust real-time detection and motion forecasting of traffic participants is necessary for autonomous vehicles to safely navigate urban environments. In this paper, we present RV-FuseNet, a novel end-to-end approach for joint detection and trajectory estimation directly from time-series LiDAR data. Instead of the widely used bird's eye view (BEV) representation, we utilize the native range view (RV) representation of LiDAR data. The RV preserves the full resolution of the sensor by avoiding the voxelization used in the BEV. Furthermore, RV can be processed efficiently due to its compactness. Previous approaches project time-series data to a common viewpoint for temporal fusion, and often this viewpoint is different from where it was captured. This is sufficient for BEV methods, but for RV methods, this can lead to loss of information and data distortion which has an adverse impact on performance. To address this challenge we propose a simple yet effective novel architecture, \textit{Incremental Fusion}, that minimizes the information loss by sequentially projecting each RV sweep into the viewpoint of the next sweep in time. We show that our approach significantly improves motion forecasting performance over the existing state-of-the-art. Furthermore, we demonstrate that our sequential fusion approach is superior to alternative RV based fusion methods on multiple datasets.

CVMar 12, 2020
LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting

Gregory P. Meyer, Jake Charland, Shreyash Pandey et al.

In this work, we present LaserFlow, an efficient method for 3D object detection and motion forecasting from LiDAR. Unlike the previous work, our approach utilizes the native range view representation of the LiDAR, which enables our method to operate at the full range of the sensor in real-time without voxelization or compression of the data. We propose a new multi-sweep fusion architecture, which extracts and merges temporal features directly from the range images. Furthermore, we propose a novel technique for learning a probability distribution over future trajectories inspired by curriculum learning. We evaluate LaserFlow on two autonomous driving datasets and demonstrate competitive results when compared to the existing state-of-the-art methods.

CVApr 25, 2019
Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation

Gregory P. Meyer, Jake Charland, Darshan Hegde et al.

In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand the capabilities of the model to perform 3D semantic segmentation in addition to 3D object detection. On a large benchmark dataset, we demonstrate our approach achieves state-of-the-art performance on both object detection and semantic segmentation while maintaining a low runtime.

CVMar 20, 2019
LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving

Gregory P. Meyer, Ankit Laddha, Eric Kee et al.

In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of the sensor, where the input data is naturally compact. Operating in the range view involves well known challenges for learning, including occlusion and scale variation, but it also provides contextual information based on how the sensor data was captured. Our approach uses a fully convolutional network to predict a multimodal distribution over 3D boxes for each point and then it efficiently fuses these distributions to generate a prediction for each object. Experiments show that modeling each detection as a distribution rather than a single deterministic box leads to better overall detection performance. Benchmark results show that this approach has significantly lower runtime than other recent detectors and that it achieves state-of-the-art performance when compared on a large dataset that has enough data to overcome the challenges of training on the range view.

CVApr 7, 2016
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

Gordon Christie, Ankit Laddha, Aishwarya Agrawal et al.

We present an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images. Some ambiguities in language cannot be resolved without simultaneously reasoning about an associated image. If we consider the sentence "I shot an elephant in my pajamas", looking at language alone (and not using common sense), it is unclear if it is the person or the elephant wearing the pajamas or both. Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair. We show that our semantic segmentation and prepositional phrase attachment resolution modules have complementary strengths, and that joint reasoning produces more accurate results than any module operating in isolation. Multiple hypotheses are also shown to be crucial to improved multiple-module reasoning. Our vision and language approach significantly outperforms the Stanford Parser (De Marneffe et al., 2006) by 17.91% (28.69% relative) and 12.83% (25.28% relative) in two different experiments. We also make small improvements over DeepLab-CRF (Chen et al., 2015).