ROApr 5, 2024
Scaling Motion Forecasting Models with Ensemble DistillationScott Ettinger, Kratarth Goel, Avikalp Srivastava et al.
Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting systems subject to limited compute budgets by combining model ensemble and distillation techniques. The use of ensembles of deep neural networks has been shown to improve generalization accuracy in many application domains. We first demonstrate significant performance gains by creating a large ensemble of optimized single models. We then develop a generalized framework to distill motion forecasting model ensembles into small student models which retain high performance with a fraction of the computing cost. For this study we focus on the task of motion forecasting using real world data from autonomous driving systems. We develop ensemble models that are very competitive on the Waymo Open Motion Dataset (WOMD) and Argoverse leaderboards. From these ensembles, we train distilled student models which have high performance at a fraction of the compute costs. These experiments demonstrate distillation from ensembles as an effective method for improving accuracy of predictive models for robotic systems with limited compute budgets.
CVNov 29, 2021
MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior PredictionBalakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivastava et al.
Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many design choices. The first key design difference is a departure from dense image-based encoding of the input world state in favor of a sparse encoding of heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe road features, and raw agent state information directly (e.g., position, velocity, acceleration). We propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and output aggregation techniques -- common in other ML domains -- and find effective variants for our probabilistic multimodal output representation. We perform an extensive ablation on these design choices, and show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion Prediction Challenge.
CLAug 29, 2018
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A PlatformsAvikalp Srivastava, Hsin Wen Liu, Sumio Fujita
Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.
IRDec 15, 2017
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based RetrievalAvikalp Srivastava, Madhav Datt
Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of "soft seeding" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.
IRMay 5, 2017
Social Media Advertisement Outreach: Learning the Role of AestheticsAvikalp Srivastava, Madhav Datt, Jaikrishna Chaparala et al.
Corporations spend millions of dollars on developing creative image-based promotional content to advertise to their user-base on platforms like Twitter. Our paper is an initial study, where we propose a novel method to evaluate and improve outreach of promotional images from corporations on Twitter, based purely on their describable aesthetic attributes. Existing works in aesthetic based image analysis exclusively focus on the attributes of digital photographs, and are not applicable to advertisements due to the influences of inherent content and context based biases on outreach. Our paper identifies broad categories of biases affecting such images, describes a method for normalization to eliminate effects of those biases and score images based on their outreach, and examines the effects of certain handcrafted describable aesthetic features on image outreach. Optimizing on the describable aesthetic features resulting from this research is a simple method for corporations to complement their existing marketing strategy to gain significant improvement in user engagement on social media for promotional images.