LGMay 18, 2022
Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control TasksRyan Sander, Wilko Schwarting, Tim Seyde et al. · mit
Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.
AIOct 31, 2025Code
Advancing AI Challenges for the United States Department of the Air ForceChristian Prothmann, Vijay Gadepally, Jeremy Kepner et al.
The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.
NADec 11, 2018
A continuous analogue of the tensor-train decompositionAlex A. Gorodetsky, Sertac Karaman, Youssef M. Marzouk
We develop new approximation algorithms and data structures for representing and computing with multivariate functions using the functional tensor-train (FT), a continuous extension of the tensor-train (TT) decomposition. The FT represents functions using a tensor-train ansatz by replacing the three-dimensional TT cores with univariate matrix-valued functions. The main contribution of this paper is a framework to compute the FT that employs adaptive approximations of univariate fibers, and that is not tied to any tensorized discretization. The algorithm can be coupled with any univariate linear or nonlinear approximation procedure. We demonstrate that this approach can generate multivariate function approximations that are several orders of magnitude more accurate, for the same cost, than those based on the conventional approach of compressing the coefficient tensor of a tensor-product basis. Our approach is in the spirit of other continuous computation packages such as Chebfun, and yields an algorithm which requires the computation of "continuous" matrix factorizations such as the LU and QR decompositions of vector-valued functions. To support these developments, we describe continuous versions of an approximate maximum-volume cross approximation algorithm and of a rounding algorithm that re-approximates an FT by one of lower ranks. We demonstrate that our technique improves accuracy and robustness, compared to TT and quantics-TT approaches with fixed parameterizations, of high-dimensional integration, differentiation, and approximation of functions with local features such as discontinuities and other nonlinearities.
ROOct 26, 2023
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation ModelsTsun-Hsuan Wang, Alaa Maalouf, Wei Xiao et al.
As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the code and demos on our project webpage at https://drive-anywhere.github.io/.
SYSep 20, 2019
Shared Linear Quadratic Regulation Control: A Reinforcement Learning ApproachMurad Abu-Khalaf, Sertac Karaman, Daniela Rus
We propose controller synthesis for state regulation problems in which a human operator shares control with an autonomy system, running in parallel. The autonomy system continuously improves over human action, with minimal intervention, and can take over full-control. It additively combines user input with an adaptive optimal corrective signal. It is adaptive in that it neither estimates nor requires a model of the human's action policy, or the internal dynamics of the plant, and can adjust to changes in both. Our contribution is twofold; first, a new synthesis for shared control which we formulate as an adaptive optimal control problem for continuous-time linear systems and solve it online as a human-in-the-loop reinforcement learning. The result is an architecture that we call shared linear quadratic regulator (sLQR). Second, we provide new analysis of reinforcement learning for continuous-time linear systems in two parts. In the first analysis part, we avoid learning along a single state-space trajectory which we show leads to data collinearity under certain conditions. We make a clear separation between exploitation of learned policies and exploration of the state-space, and propose an exploration scheme that requires switching to new state-space trajectories rather than injecting noise continuously while learning. This avoidance of continuous noise injection minimizes interference with human action, and avoids bias in the convergence to the stabilizing solution of the underlying algebraic Riccati equation. We show that exploring a minimum number of pairwise distinct state-space trajectories is necessary to avoid collinearity in the learning data. In the second analysis part, we show conditions under which existence and uniqueness of solutions can be established for off-policy reinforcement learning in continuous-time linear systems; namely, prior knowledge of the input matrix.
CLApr 23, 2023
Studying the Impact of Semi-Cooperative Drivers on Overall Highway FlowNoam Buckman, Sertac Karaman, Daniela Rus
Semi-cooperative behaviors are intrinsic properties of human drivers and should be considered for autonomous driving. In addition, new autonomous planners can consider the social value orientation (SVO) of human drivers to generate socially-compliant trajectories. Yet the overall impact on traffic flow for this new class of planners remain to be understood. In this work, we present study of implicit semi-cooperative driving where agents deploy a game-theoretic version of iterative best response assuming knowledge of the SVOs of other agents. We simulate nominal traffic flow and investigate whether the proportion of prosocial agents on the road impact individual or system-wide driving performance. Experiments show that the proportion of prosocial agents has a minor impact on overall traffic flow and that benefits of semi-cooperation disproportionally affect egoistic and high-speed drivers.
ROMar 21, 2023
Infrastructure-based End-to-End Learning and Prevention of Driver FailureNoam Buckman, Shiva Sreeram, Mathias Lechner et al.
Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they approach an intersection and detects whether a failure is present in the autonomy stack, warning cross-traffic of potentially dangerous drivers. FailureNet can accurately identify control failures, upstream perception errors, and speeding drivers, distinguishing them from nominal driving. The network is trained and deployed with autonomous vehicles in the MiniCity. Compared to speed or frequency-based predictors, FailureNet's recurrent neural network structure provides improved predictive power, yielding upwards of 84% accuracy when deployed on hardware.
RODec 14, 2022
Learning and Predicting Multimodal Vehicle Action Distributions in a Unified Probabilistic Model Without LabelsCharles Richter, Patrick R. Barragán, Sertac Karaman
We present a unified probabilistic model that learns a representative set of discrete vehicle actions and predicts the probability of each action given a particular scenario. Our model also enables us to estimate the distribution over continuous trajectories conditioned on a scenario, representing what each discrete action would look like if executed in that scenario. While our primary objective is to learn representative action sets, these capabilities combine to produce accurate multimodal trajectory predictions as a byproduct. Although our learned action representations closely resemble semantically meaningful categories (e.g., "go straight", "turn left", etc.), our method is entirely self-supervised and does not utilize any manually generated labels or categories. Our method builds upon recent advances in variational inference and deep unsupervised clustering, resulting in full distribution estimates based on deterministic model evaluations.
82.7ROMar 30
Gleanmer: A 6 mW SoC for Real-Time 3D Gaussian Occupancy MappingZih-Sing Fu, Peter Zhi Xuan Li, Sertac Karaman et al.
High-fidelity 3D occupancy mapping is essential for many edge-based applications (such as AR/VR and autonomous navigation) but is limited by power constraints. We present Gleanmer, a system on chip (SoC) with an accelerator for GMMap, a 3D occupancy map using Gaussians. Through algorithm-hardware co-optimizations for direct computation and efficient reuse of these compact Gaussians, Gleanmer reduces construction and query energy by up to 63% and 81%, respectively. Approximate computation on Gaussians reduces accelerator area by 38%. Using 16nm CMOS, Gleanmer processes 640x480 images in real time beyond 88 fps during map construction and processes over 540K coordinates per second during map query. To our knowledge, Gleanmer is the first fabricated SoC to achieve real-time 3D occupancy mapping under 6 mW for edge-based applications.
62.5ROMay 21
UfM*: Uncertainty from Motion* for DNN Depth Estimation Using GaussiansSoumya Sudhakar, Sertac Karaman, Vivienne Sze
Reliable uncertainty estimation is critical for deploying monocular depth deep neural networks (DNNs) in safety-critical robotic systems. Conventional uncertainty methods such as ensembles and sampling-based approaches require multiple inferences per image, incurring substantial compute and memory overhead. Moreover, uncertainty predicted from a single image misses out on measuring disagreement between predictions across views of the same region. We propose Uncertainty from Motion* (UfM*), an uncertainty estimation algorithm that measures multiview disagreement efficiently by comparing previous and current views using a compact Gaussian mixture, requiring only a single DNN inference per image. Using Gaussians to compute multiview disagreement is not only more compute- and memory-efficient than a prior approach using a point cloud, but also improves uncertainty by measuring disagreement across regions of 3D space. UfM* paired with aleatoric uncertainty improves expected calibration error by 24-28% compared to an ensemble, while requiring only 3% of the energy and 0.02% of the memory on 100 out-of-distribution ScanNet sequences. We demonstrate UfM* consumes only 63 mJ per 224x224 image while running real-time at 30 FPS on an Arm Cortex-A76 CPU onboard a miniature energy-constrained robot, highlighting that measuring multiview disagreement using Gaussians enables efficient uncertainty for resource-constrained robotic systems.
RONov 23, 2021Code
VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous VehiclesAlexander Amini, Tsun-Hsuan Wang, Igor Gilitschenski et al.
Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and greater robustness than those trained exclusively on real-world data.
CVSep 1, 2021Code
Searching for Efficient Multi-Stage Vision TransformersYi-Lun Liao, Sertac Karaman, Vivienne Sze
Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted in computer vision for years. This naturally raises the question of how the performance of ViT can be advanced with design techniques of CNN. To this end, we propose to incorporate two techniques and present ViT-ResNAS, an efficient multi-stage ViT architecture designed with neural architecture search (NAS). First, we propose residual spatial reduction to decrease sequence lengths for deeper layers and utilize a multi-stage architecture. When reducing lengths, we add skip connections to improve performance and stabilize training deeper networks. Second, we propose weight-sharing NAS with multi-architectural sampling. We enlarge a network and utilize its sub-networks to define a search space. A super-network covering all sub-networks is then trained for fast evaluation of their performance. To efficiently train the super-network, we propose to sample and train multiple sub-networks with one forward-backward pass. After that, evolutionary search is performed to discover high-performance network architectures. Experiments on ImageNet demonstrate that ViT-ResNAS achieves better accuracy-MACs and accuracy-throughput trade-offs than the original DeiT and other strong baselines of ViT. Code is available at https://github.com/yilunliao/vit-search.
LGNov 25, 2024
Generating Out-Of-Distribution Scenarios Using Language ModelsErfan Aasi, Phat Nguyen, Shiva Sreeram et al.
The deployment of autonomous vehicles controlled by machine learning techniques requires extensive testing in diverse real-world environments, robust handling of edge cases and out-of-distribution scenarios, and comprehensive safety validation to ensure that these systems can navigate safely and effectively under unpredictable conditions. Addressing Out-Of-Distribution (OOD) driving scenarios is essential for enhancing safety, as OOD scenarios help validate the reliability of the models within the vehicle's autonomy stack. However, generating OOD scenarios is challenging due to their long-tailed distribution and rarity in urban driving dataset. Recently, Large Language Models (LLMs) have shown promise in autonomous driving, particularly for their zero-shot generalization and common-sense reasoning capabilities. In this paper, we leverage these LLM strengths to introduce a framework for generating diverse OOD driving scenarios. Our approach uses LLMs to construct a branching tree, where each branch represents a unique OOD scenario. These scenarios are then simulated in the CARLA simulator using an automated framework that aligns scene augmentation with the corresponding textual descriptions. We evaluate our framework through extensive simulations, and assess its performance via a diversity metric that measures the richness of the scenarios. Additionally, we introduce a new "OOD-ness" metric, which quantifies how much the generated scenarios deviate from typical urban driving conditions. Furthermore, we explore the capacity of modern Vision-Language Models (VLMs) to interpret and safely navigate through the simulated OOD scenarios. Our findings offer valuable insights into the reliability of language models in addressing OOD scenarios within the context of urban driving.
ROOct 18, 2024
Learning autonomous driving from aerial imageryVarun Murali, Guy Rosman, Sertac Karaman et al.
In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.
CVFeb 10
Flow Matching with Uncertainty Quantification and GuidanceJuyeop Han, Lukas Lao Beyer, Sertac Karaman
Despite the remarkable success of sampling-based generative models such as flow matching, they can still produce samples of inconsistent or degraded quality. To assess sample reliability and generate higher-quality outputs, we propose uncertainty-aware flow matching (UA-Flow), a lightweight extension of flow matching that predicts the velocity field together with heteroscedastic uncertainty. UA-Flow estimates per-sample uncertainty by propagating velocity uncertainty through the flow dynamics. These uncertainty estimates act as a reliability signal for individual samples, and we further use them to steer generation via uncertainty-aware classifier guidance and classifier-free guidance. Experiments on image generation show that UA-Flow produces uncertainty signals more highly correlated with sample fidelity than baseline methods, and that uncertainty-guided sampling further improves generation quality.
CVAug 2, 2025
Construction of Digital Terrain Maps from Multi-view Satellite Imagery using Neural Volume RenderingJosef X. Biberstein, Guilherme Cavalheiro, Juyeop Han et al.
Digital terrain maps (DTMs) are an important part of planetary exploration, enabling operations such as terrain relative navigation during entry, descent, and landing for spacecraft and aiding in navigation on the ground. As robotic exploration missions become more ambitious, the need for high quality DTMs will only increase. However, producing DTMs via multi-view stereo pipelines for satellite imagery, the current state-of-the-art, can be cumbersome and require significant manual image preprocessing to produce satisfactory results. In this work, we seek to address these shortcomings by adapting neural volume rendering techniques to learn textured digital terrain maps directly from satellite imagery. Our method, neural terrain maps (NTM), only requires the locus for each image pixel and does not rely on depth or any other structural priors. We demonstrate our method on both synthetic and real satellite data from Earth and Mars encompassing scenes on the order of $100 \textrm{km}^2$. We evaluate the accuracy of our output terrain maps by comparing with existing high-quality DTMs produced using traditional multi-view stereo pipelines. Our method shows promising results, with the precision of terrain prediction almost equal to the resolution of the satellite images even in the presence of imperfect camera intrinsics and extrinsics.
ROFeb 20, 2025
Real-Time Sampling-based Online Planning for Drone InterceptionGilhyun Ryou, Lukas Lao Beyer, Sertac Karaman
This paper studies high-speed online planning in dynamic environments. The problem requires finding time-optimal trajectories that conform to system dynamics, meeting computational constraints for real-time adaptation, and accounting for uncertainty from environmental changes. To address these challenges, we propose a sampling-based online planning algorithm that leverages neural network inference to replace time-consuming nonlinear trajectory optimization, enabling rapid exploration of multiple trajectory options under uncertainty. The proposed method is applied to the drone interception problem, where a defense drone must intercept a target while avoiding collisions and handling imperfect target predictions. The algorithm efficiently generates trajectories toward multiple potential target drone positions in parallel. It then assesses trajectory reachability by comparing traversal times with the target drone's predicted arrival time, ultimately selecting the minimum-time reachable trajectory. Through extensive validation in both simulated and real-world environments, we demonstrate our method's capability for high-rate online planning and its adaptability to unpredictable movements in unstructured settings.
ROMay 9, 2024
Probing Multimodal LLMs as World Models for DrivingShiva Sreeram, Tsun-Hsuan Wang, Alaa Maalouf et al.
We provide a sober look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving, challenging common assumptions about their ability to interpret dynamic driving scenarios. Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored. Our experimental study assesses various MLLMs as world models using in-car camera perspectives and reveals that while these models excel at interpreting individual images, they struggle to synthesize coherent narratives across frames, leading to considerable inaccuracies in understanding (i) ego vehicle dynamics, (ii) interactions with other road actors, (iii) trajectory planning, and (iv) open-set scene reasoning. We introduce the Eval-LLM-Drive dataset and DriveSim simulator to enhance our evaluation, highlighting gaps in current MLLM capabilities and the need for improved models in dynamic real-world environments.
RONov 23, 2021
Learning Interactive Driving Policies via Data-driven SimulationTsun-Hsuan Wang, Alexander Amini, Wilko Schwarting et al.
Data-driven simulators promise high data-efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a simulation method that uses in-painted ado vehicles for learning robust driving policies. Thus, our approach can be used to learn policies that involve multi-agent interactions and allows for training via state-of-the-art policy learning methods. We evaluate the approach for learning standard interaction scenarios in driving. In extensive experiments, our work demonstrates that the resulting policies can be directly transferred to a full-scale autonomous vehicle without making use of any traditional sim-to-real transfer techniques such as domain randomization.
ROMay 20, 2021
Efficient and Robust LiDAR-Based End-to-End NavigationZhijian Liu, Alexander Amini, Sibo Zhu et al.
Deep learning has been used to demonstrate end-to-end neural network learning for autonomous vehicle control from raw sensory input. While LiDAR sensors provide reliably accurate information, existing end-to-end driving solutions are mainly based on cameras since processing 3D data requires a large memory footprint and computation cost. On the other hand, increasing the robustness of these systems is also critical; however, even estimating the model's uncertainty is very challenging due to the cost of sampling-based methods. In this paper, we present an efficient and robust LiDAR-based end-to-end navigation framework. We first introduce Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass and then fuses the control predictions intelligently. We evaluate our system on a full-scale vehicle and demonstrate lane-stable as well as navigation capabilities. In the presence of out-of-distribution events (e.g., sensor failures), our system significantly improves robustness and reduces the number of takeovers in the real world.
LGFeb 19, 2021
Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent SpaceWilko Schwarting, Tim Seyde, Igor Gilitschenski et al.
Learning competitive behaviors in multi-agent settings such as racing requires long-term reasoning about potential adversarial interactions. This paper presents Deep Latent Competition (DLC), a novel reinforcement learning algorithm that learns competitive visual control policies through self-play in imagination. The DLC agent imagines multi-agent interaction sequences in the compact latent space of a learned world model that combines a joint transition function with opponent viewpoint prediction. Imagined self-play reduces costly sample generation in the real world, while the latent representation enables planning to scale gracefully with observation dimensionality. We demonstrate the effectiveness of our algorithm in learning competitive behaviors on a novel multi-agent racing benchmark that requires planning from image observations. Code and videos available at https://sites.google.com/view/deep-latent-competition.
LGOct 27, 2020
Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model EnsemblesTim Seyde, Wilko Schwarting, Sertac Karaman et al.
Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this objective. This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling. The policy is then trained on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve long-term performance. We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives. In sparse and hard to explore environments we achieve an average improvement of over 30%.
ROJun 3, 2020
Multi-Fidelity Black-Box Optimization for Time-Optimal Quadrotor ManeuversGilhyun Ryou, Ezra Tal, Sertac Karaman
We consider the problem of generating a time-optimal quadrotor trajectory that attains a set of prescribed waypoints. This problem is challenging since the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including hardware and software, in agile high-speed flight. In this work, we propose a multi-fidelity Bayesian optimization framework that models the feasibility constraints based on analytical approximation, numerical simulation, and real-world flight experiments. By combining evaluations at different fidelities, trajectory time is optimized while keeping the number of required costly flight experiments to a minimum. The algorithm is thoroughly evaluated in both simulation and real-world flight experiments at speeds up to 11 m/s. Resulting trajectories were found to be significantly faster than those obtained through minimum-snap trajectory planning.
ROMay 28, 2020
Perception-aware time optimal path parameterization for quadrotorsIgor Spasojevic, Varun Murali, Sertac Karaman
The increasing popularity of quadrotors has given rise to a class of predominantly vision-driven vehicles. This paper addresses the problem of perception-aware time optimal path parametrization for quadrotors. Although many different choices of perceptual modalities are available, the low weight and power budgets of quadrotor systems makes a camera ideal for on-board navigation and estimation algorithms. However, this does come with a set of challenges. The limited field of view of the camera can restrict the visibility of salient regions in the environment, which dictates the necessity to consider perception and planning jointly. The main contribution of this paper is an efficient time optimal path parametrization algorithm for quadrotors with limited field of view constraints. We show in a simulation study that a state-of-the-art controller can track planned trajectories, and we validate the proposed algorithm on a quadrotor platform in experiments.
RODec 14, 2019
Deep Context Maps: Agent Trajectory Prediction using Location-specific Latent MapsIgor Gilitschenski, Guy Rosman, Arjun Gupta et al.
In this paper, we propose a novel approach for agent motion prediction in cluttered environments. One of the main challenges in predicting agent motion is accounting for location and context-specific information. Our main contribution is the concept of learning context maps to improve the prediction task. Context maps are a set of location-specific latent maps that are trained alongside the predictor. Thus, the proposed maps are capable of capturing location context beyond visual context cues (e.g. usual average speeds and typical trajectories) or predefined map primitives (such as lanes and stop lines). We pose context map learning as a multi-task training problem and describe our map model and its incorporation into a state-of-the-art trajectory predictor. In extensive experiments, it is shown that use of learned maps can significantly improve predictor accuracy. Furthermore, the performance can be additionally boosted by providing partial knowledge of map semantics.
MLSep 24, 2019
A Theory of Uncertainty Variables for State Estimation and InferenceRajat Talak, Sertac Karaman, Eytan Modiano
We develop a new framework of uncertainty variables to model uncertainty. An uncertainty variable is characterized by an uncertainty set, in which its realization is bound to lie, while the conditional uncertainty is characterized by a set map, from a given realization of a variable to a set of possible realizations of another variable. We prove Bayes' law and the law of total probability equivalents for uncertainty variables. We define a notion of independence, conditional independence, and pairwise independence for a collection of uncertainty variables, and show that this new notion of independence preserves the properties of independence defined over random variables. We then develop a graphical model, namely Bayesian uncertainty network, a Bayesian network equivalent defined over a collection of uncertainty variables, and show that all the natural conditional independence properties, expected out of a Bayesian network, hold for the Bayesian uncertainty network. We also define the notion of point estimate, and show its relation with the maximum a posteriori estimate. Probability theory starts with a distribution function (equivalently a probability measure) as a primitive and builds all other useful concepts, such as law of total probability, Bayes' law, independence, graphical models, point estimate, on it. Our work shows that it is perfectly possible to start with a set, instead of a distribution function, and retain all the useful ideas needed for state estimation and inference.
ROSep 16, 2019
Stochastic Dynamic Games in Belief SpaceWilko Schwarting, Alyssa Pierson, Sertac Karaman et al.
Information gathering while interacting with other agents under sensing and motion uncertainty is critical in domains such as driving, service robots, racing, or surveillance. The interests of agents may be at odds with others, resulting in a stochastic non-cooperative dynamic game. Agents must predict others' future actions without communication, incorporate their actions into these predictions, account for uncertainty and noise in information gathering, and consider what information their actions reveal. Our solution uses local iterative dynamic programming in Gaussian belief space to solve a game-theoretic continuous POMDP. Solving a quadratic game in the backward pass of a game-theoretic belief-space variant of iLQG achieves a runtime polynomial in the number of agents and linear in the planning horizon. Our algorithm yields linear feedback policies for our robot, and predicted feedback policies for other agents. We present three applications: active surveillance, guiding eyes for a blind agent, and autonomous racing. Agents with game-theoretic belief-space planning win 44% more races than without game theory and 34% more than without belief-space planning.
ROMay 27, 2019
FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics SimulationWinter Guerra, Ezra Tal, Varun Murali et al.
FlightGoggles is a photorealistic sensor simulator for perception-driven robotic vehicles. The key contributions of FlightGoggles are twofold. First, FlightGoggles provides photorealistic exteroceptive sensor simulation using graphics assets generated with photogrammetry. Second, it provides the ability to combine (i) synthetic exteroceptive measurements generated in silico in real time and (ii) vehicle dynamics and proprioceptive measurements generated in motio by vehicle(s) in a motion-capture facility. FlightGoggles is capable of simulating a virtual-reality environment around autonomous vehicle(s). While a vehicle is in flight in the FlightGoggles virtual reality environment, exteroceptive sensors are rendered synthetically in real time while all complex extrinsic dynamics are generated organically through the natural interactions of the vehicle. The FlightGoggles framework allows for researchers to accelerate development by circumventing the need to estimate complex and hard-to-model interactions such as aerodynamics, motor mechanics, battery electrochemistry, and behavior of other agents. The ability to perform vehicle-in-the-loop experiments with photorealistic exteroceptive sensor simulation facilitates novel research directions involving, e.g., fast and agile autonomous flight in obstacle-rich environments, safe human interaction, and flexible sensor selection. FlightGoggles has been utilized as the main test for selecting nine teams that will advance in the AlphaPilot autonomous drone racing challenge. We survey approaches and results from the top AlphaPilot teams, which may be of independent interest.
ROMay 6, 2019
FSMI: Fast computation of Shannon Mutual Information for information-theoretic mappingZhengdong Zhang, Trevor Henderson, Sertac Karaman et al.
Exploration tasks are embedded in many robotics applications, such as search and rescue and space exploration. Information-based exploration algorithms aim to find the most informative trajectories by maximizing an information-theoretic metric, such as the mutual information between the map and potential future measurements. Unfortunately, most existing information-based exploration algorithms are plagued by the computational difficulty of evaluating the Shannon mutual information metric. In this paper, we consider the fundamental problem of evaluating Shannon mutual information between the map and a range measurement. First, we consider 2D environments. We propose a novel algorithm, called the Fast Shannon Mutual Information (FSMI). The key insight behind the algorithm is that a certain integral can be computed analytically, leading to substantial computational savings. Second, we consider 3D environments, represented by efficient data structures, e.g., an OctoMap, such that the measurements are compressed by Run-Length Encoding (RLE). We propose a novel algorithm, called FSMI-RLE, that efficiently evaluates the Shannon mutual information when the measurements are compressed using RLE. For both the FSMI and the FSMI-RLE, we also propose variants that make different assumptions on the sensor noise distribution for the purpose of further computational savings. We evaluate the proposed algorithms in extensive experiments. In particular, we show that the proposed algorithms outperform existing algorithms that compute Shannon mutual information as well as other algorithms that compute the Cauchy-Schwarz Quadratic mutual information (CSQMI). In addition, we demonstrate the computation of Shannon mutual information on a 3D map for the first time.
ROApr 10, 2019
Asymptotic Optimality of a Time Optimal Path Parametrization AlgorithmIgor Spasojevic, Varun Murali, Sertac Karaman
Time Optimal Path Parametrization is the problem of minimizing the time interval during which an actuation constrained agent can traverse a given path. Recently, an efficient linear-time algorithm for solving this problem was proposed. However, its optimality was proved for only a strict subclass of problems solved optimally by more computationally intensive approaches based on convex programming. In this paper, we prove that the same linear-time algorithm is asymptotically optimal for all problems solved optimally by convex optimization approaches. We also characterize the optimum of the Time Optimal Path Parametrization Problem, which may be of independent interest.
CVMar 8, 2019
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsDiana Wofk, Fangchang Ma, Tien-Ju Yang et al.
Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.
LGNov 25, 2018
Variational End-to-End Navigation and LocalizationAlexander Amini, Guy Rosman, Sertac Karaman et al.
Deep learning has revolutionized the ability to learn "end-to-end" autonomous vehicle control directly from raw sensory data. While there have been recent extensions to handle forms of navigation instruction, these works are unable to capture the full distribution of possible actions that could be taken and to reason about localization of the robot within the environment. In this paper, we extend end-to-end driving networks with the ability to perform point-to-point navigation as well as probabilistic localization using only noisy GPS data. We define a novel variational network capable of learning from raw camera data of the environment as well as higher level roadmaps to predict (1) a full probability distribution over the possible control commands; and (2) a deterministic control command capable of navigating on the route specified within the map. Additionally, we formulate how our model can be used to localize the robot according to correspondences between the map and the observed visual road topology, inspired by the rough localization that human drivers can perform. We test our algorithms on real-world driving data that the vehicle has never driven through before, and integrate our point-to-point navigation algorithms onboard a full-scale autonomous vehicle for real-time performance. Our localization algorithm is also evaluated over a new set of roads and intersections to demonstrates rough pose localization even in situations without any GPS prior.
CVOct 3, 2018
The Blackbird Dataset: A large-scale dataset for UAV perception in aggressive flightAmado Antonini, Winter Guerra, Varun Murali et al.
The Blackbird unmanned aerial vehicle (UAV) dataset is a large-scale, aggressive indoor flight dataset collected using a custom-built quadrotor platform for use in evaluation of agile perception.Inspired by the potential of future high-speed fully-autonomous drone racing, the Blackbird dataset contains over 10 hours of flight data from 168 flights over 17 flight trajectories and 5 environments at velocities up to $7.0ms^-1$. Each flight includes sensor data from 120Hz stereo and downward-facing photorealistic virtual cameras, 100Hz IMU, $\sim190Hz$ motor speed sensors, and 360Hz millimeter-accurate motion capture ground truth. Camera images for each flight were photorealistically rendered using FlightGoggles across a variety of environments to facilitate easy experimentation of high performance perception algorithms. The dataset is available for download at http://blackbird-dataset.mit.edu/
ROSep 15, 2018
Navion: A 2mW Fully Integrated Real-Time Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano DronesAmr Suleiman, Zhengdong Zhang, Luca Carlone et al.
This paper presents Navion, an energy-efficient accelerator for visual-inertial odometry (VIO) that enables autonomous navigation of miniaturized robots (e.g., nano drones), and virtual/augmented reality on portable devices. The chip uses inertial measurements and mono/stereo images to estimate the drone's trajectory and a 3D map of the environment. This estimate is obtained by running a state-of-the-art VIO algorithm based on non-linear factor graph optimization, which requires large irregularly structured memories and heterogeneous computation flow. To reduce the energy consumption and footprint, the entire VIO system is fully integrated on chip to eliminate costly off-chip processing and storage. This work uses compression and exploits both structured and unstructured sparsity to reduce on-chip memory size by 4.1$\times$. Parallelism is used under tight area constraints to increase throughput by 43%. The chip is fabricated in 65nm CMOS, and can process 752$\times$480 stereo images from EuRoC dataset in real-time at 20 frames per second (fps) consuming only an average power of 2mW. At its peak performance, Navion can process stereo images at up to 171 fps and inertial measurements at up to 52 kHz, while consuming an average of 24mW. The chip is configurable to maximize accuracy, throughput and energy-efficiency trade-offs and to adapt to different environments. To the best of our knowledge, this is the first fully integrated VIO system in an ASIC.
ROSep 11, 2018
Accurate Tracking of Aggressive Quadrotor Trajectories using Incremental Nonlinear Dynamic Inversion and Differential FlatnessEzra Tal, Sertac Karaman
Autonomous unmanned aerial vehicles (UAVs) that can execute aggressive (i.e., high-speed and high-acceleration) maneuvers have attracted significant attention in the past few years. This paper focuses on accurate tracking of aggressive quadcopter trajectories. We propose a novel control law for tracking of position and yaw angle and their derivatives of up to fourth order, specifically, velocity, acceleration, jerk, and snap along with yaw rate and yaw acceleration. Jerk and snap are tracked using feedforward inputs for angular rate and angular acceleration based on the differential flatness of the quadcopter dynamics. Snap tracking requires direct control of body torque, which we achieve using closed-loop motor speed control based on measurements from optical encoders attached to the motors. The controller utilizes incremental nonlinear dynamic inversion (INDI) for robust tracking of linear and angular accelerations despite external disturbances, such as aerodynamic drag forces. Hence, prior modeling of aerodynamic effects is not required. We rigorously analyze the proposed control law through response analysis, and we demonstrate it in experiments. The controller enables a quadcopter UAV to track complex 3D trajectories, reaching speeds up to 12.9 m/s and accelerations up to 2.1g, while keeping the root-mean-square tracking error down to 6.6 cm, in a flight volume that is roughly 18 m by 7 m and 3 m tall. We also demonstrate the robustness of the controller by attaching a drag plate to the UAV in flight tests and by pulling on the UAV with a rope during hover.
ROAug 1, 2018
Perception-driven sparse graphs for optimal motion planningThomas Sayre-McCord, Sertac Karaman
Most existing motion planning algorithms assume that a map (of some quality) is fully determined prior to generating a motion plan. In many emerging applications of robotics, e.g., fast-moving agile aerial robots with constrained embedded computational platforms and visual sensors, dense maps of the world are not immediately available, and they are computationally expensive to construct. We propose a new algorithm for generating plan graphs which couples the perception and motion planning processes for computational efficiency. In a nutshell, the proposed algorithm iteratively switches between the planning sub-problem and the mapping sub-problem, each updating based on the other until a valid trajectory is found. The resulting trajectory retains a provable property of providing an optimal trajectory with respect to the full (unmapped) environment, while utilizing only a fraction of the sensing data in computational experiments.
CVJul 1, 2018
Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular CameraFangchang Ma, Guilherme Venturelli Cavalheiro, Sertac Karaman
Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels. In this work, we address all these challenges. Specifically, we develop a deep regression model to learn a direct mapping from sparse depth (and color images) to dense depth. We also propose a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels. Our experiments demonstrate that our network, when trained with semi-dense annotations, attains state-of-the- art accuracy and is the winning approach on the KITTI depth completion benchmark at the time of submission. Furthermore, the self-supervised framework outperforms a number of existing solutions trained with semi- dense annotations.
AIMay 13, 2018
Spatial Uncertainty Sampling for End-to-End ControlAlexander Amini, Ava Soleimany, Sertac Karaman et al.
End-to-end trained neural networks (NNs) are a compelling approach to autonomous vehicle control because of their ability to learn complex tasks without manual engineering of rule-based decisions. However, challenging road conditions, ambiguous navigation situations, and safety considerations require reliable uncertainty estimation for the eventual adoption of full-scale autonomous vehicles. Bayesian deep learning approaches provide a way to estimate uncertainty by approximating the posterior distribution of weights given a set of training data. Dropout training in deep NNs approximates Bayesian inference in a deep Gaussian process and can thus be used to estimate model uncertainty. In this paper, we propose a Bayesian NN for end-to-end control that estimates uncertainty by exploiting feature map correlation during training. This approach achieves improved model fits, as well as tighter uncertainty estimates, than traditional element-wise dropout. We evaluate our algorithms on a challenging dataset collected over many different road types, times of day, and weather conditions, and demonstrate how uncertainties can be used in conjunction with a human controller in a parallel autonomous setting.
CVFeb 12, 2018
A General Pipeline for 3D Detection of VehiclesXinxin Du, Marcelo H. Ang, Sertac Karaman et al.
Autonomous driving requires 3D perception of vehicles and other objects in the in environment. Much of the current methods support 2D vehicle detection. This paper proposes a flexible pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to generate 3D information with minimum changes of the 2D detection networks. To identify the 3D box, an effective model fitting algorithm is developed based on generalised car models and score maps. A two-stage convolutional neural network (CNN) is proposed to refine the detected 3D box. This pipeline is tested on the KITTI dataset using two different 2D detection networks. The 3D detection results based on these two networks are similar, demonstrating the flexibility of the proposed pipeline. The results rank second among the 3D detection algorithms, indicating its competencies in 3D detection.
ROSep 21, 2017
Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single ImageFangchang Ma, Sertac Karaman
We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.
DSMay 2, 2017
CDDT: Fast Approximate 2D Ray Casting for Accelerated LocalizationCorey Walsh, Sertac Karaman
Localization is an essential component for autonomous robots. A well-established localization approach combines ray casting with a particle filter, leading to a computationally expensive algorithm that is difficult to run on resource-constrained mobile robots. We present a novel data structure called the Compressed Directional Distance Transform for accelerating ray casting in two dimensional occupancy grid maps. Our approach allows online map updates, and near constant time ray casting performance for a fixed size map, in contrast with other methods which exhibit poor worst case performance. Our experimental results show that the proposed algorithm approximates the performance characteristics of reading from a three dimensional lookup table of ray cast solutions while requiring two orders of magnitude less memory and precomputation. This results in a particle filter algorithm which can maintain 2500 particles with 61 ray casts per particle at 40Hz, using a single CPU thread onboard a mobile robot.
ROApr 7, 2017
On Sensing, Agility, and Computation Requirements for a Data-gathering Agile Robotic VehicleFangchang Ma, Sertac Karaman
We consider a robotic vehicle tasked with gathering information by visiting a set of spatially-distributed data sources, the locations of which are not known a priori, but are discovered on the fly. We assume a first-order robot dynamics involving drift and that the locations of the data sources are Poisson-distributed. In this setting, we characterize the performance of the robot in terms of its sensing, agility, and computation capabilities. More specifically, the robot's performance is characterized in terms of its ability to sense the target locations from a distance, to maneuver quickly, and to perform computations for inference and planning. We also characterize the performance of the robot in terms of the amount and distribution of information that can be acquired at each data source. The following are among our theoretical results: the distribution of the amount of information among the target locations immensely impacts the requirements for sensing targets from a distance; performance increases with increasing maneuvering capability, but with diminishing returns; and the computation requirements increase more rapidly for planning as opposed to inference, with both increasing sensing range and maneuvering ability. We provide computational experiments to validate our theoretical results. Finally, we demonstrate that these results can be utilized in the co-design of sensing, actuation, and computation capabilities of mobile robotic systems for an information-gathering mission. Our proof techniques establish novel connections between the fundamental problems of robotic information-gathering and the last-passage percolation problem of statistical mechanics, which may be of interest on its own right.
ROMar 4, 2017
Sparse Depth Sensing for Resource-Constrained RobotsFangchang Ma, Luca Carlone, Ulas Ayaz et al.
We consider the case in which a robot has to navigate in an unknown environment but does not have enough on-board power or payload to carry a traditional depth sensor (e.g., a 3D lidar) and thus can only acquire a few (point-wise) depth measurements. We address the following question: is it possible to reconstruct the geometry of an unknown environment using sparse and incomplete depth measurements? Reconstruction from incomplete data is not possible in general, but when the robot operates in man-made environments, the depth exhibits some regularity (e.g., many planar surfaces with only a few edges); we leverage this regularity to infer depth from a small number of measurements. Our first contribution is a formulation of the depth reconstruction problem that bridges robot perception with the compressive sensing literature in signal processing. The second contribution includes a set of formal results that ascertain the exactness and stability of the depth reconstruction in 2D and 3D problems, and completely characterize the geometry of the profiles that we can reconstruct. Our third contribution is a set of practical algorithms for depth reconstruction: our formulation directly translates into algorithms for depth estimation based on convex programming. In real-world problems, these convex programs are very large and general-purpose solvers are relatively slow. For this reason, we discuss ad-hoc solvers that enable fast depth reconstruction in real problems. The last contribution is an extensive experimental evaluation in 2D and 3D problems, including Monte Carlo runs on simulated instances and testing on multiple real datasets. Empirical results confirm that the proposed approach ensures accurate depth reconstruction, outperforms interpolation-based strategies, and performs well even when the assumption of structured environment is violated.
RONov 15, 2016
High-Dimensional Stochastic Optimal Control using Continuous Tensor DecompositionsAlex A. Gorodetsky, Sertac Karaman, Youssef M. Marzouk
Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g., the value function) in a compressed format, and directly perform dynamic programming computations (e.g., value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in "compressible" problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to ten orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.
ROOct 11, 2016
Attention and Anticipation in Fast Visual-Inertial NavigationLuca Carlone, Sertac Karaman
We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.
SYJul 26, 2016
Polling-systems-based Autonomous Vehicle Coordination in Traffic Intersections with No Traffic SignalsDavid Miculescu, Sertac Karaman
The rapid development of autonomous vehicles spurred a careful investigation of the potential benefits of all-autonomous transportation networks. Most studies conclude that autonomous systems can enable drastic improvements in performance. A widely studied concept is all-autonomous, collision-free intersections, where vehicles arriving in a traffic intersection with no traffic light adjust their speeds to cross safely through the intersection as quickly as possible. In this paper, we propose a coordination control algorithm for this problem, assuming stochastic models for the arrival times of the vehicles. The proposed algorithm provides provable guarantees on safety and performance. More precisely, it is shown that no collisions occur surely, and moreover a rigorous upper bound is provided for the expected wait time. The algorithm is also demonstrated in simulations. The proposed algorithms are inspired by polling systems. In fact, the problem studied in this paper leads to a new polling system where customers are subject to differential constraints, which may be interesting in its own right.
ROSep 30, 2014
Optimal Tourist Problem and Anytime Planning of Trip ItinerariesJingjin Yu, Javed Aslam, Sertac Karaman et al.
We introduce and study the problem in which a mobile sensing robot (our tourist) is tasked to travel among and gather intelligence at a set of spatially distributed point-of-interests (POIs). The quality of the information collected at each POI is characterized by some non-decreasing reward function over the time spent at the POI. With limited time budget, the robot must balance between spending time traveling to POIs and spending time at POIs for information collection (sensing) so as to maximize the total reward. Alternatively, the robot may be required to acquire a minimum mount of reward and hopes to do so with the least amount of time. We propose a mixed integer programming (MIP) based anytime algorithm for solving these two NP-hard optimization problems to arbitrary precision. The effectiveness of our algorithm is demonstrated using an extensive set of computational experiments including the planning of a realistic itinerary for a first-time tourist in Istanbul.
ROSep 24, 2013
Persistent Monitoring of Events with Stochastic Arrivals at Multiple StationsJingjin Yu, Sertac Karaman, Daniela Rus
This paper introduces a new mobile sensor scheduling problem, involving a single robot tasked with monitoring several events of interest that occur at different locations. Of particular interest is the monitoring of transient events that can not be easily forecast. Application areas range from natural phenomena ({\em e.g.}, monitoring abnormal seismic activity around a volcano using a ground robot) to urban activities ({\em e.g.}, monitoring early formations of traffic congestion using an aerial robot). Motivated by those and many other examples, this paper focuses on problems in which the precise occurrence times of the events are unknown {\em a priori}, but statistics for their inter-arrival times are available. The robot's task is to monitor the events to optimize the following two objectives: {\em (i)} maximize the number of events observed and {\em (ii)} minimize the delay between two consecutive observations of events occurring at the same location. The paper considers the case when a robot is tasked with optimizing the event observations in a balanced manner, following a cyclic patrolling route. First, assuming the cyclic ordering of stations is known, we prove the existence and uniqueness of the optimal solution, and show that the optimal solution has desirable convergence and robustness properties. Our constructive proof also produces an efficient algorithm for computing the unique optimal solution with $O(n)$ time complexity, in which $n$ is the number of stations, with $O(\log n)$ time complexity for incrementally adding or removing stations. Except for the algorithm, most of the analysis remains valid when the cyclic order is unknown. We then provide a polynomial-time approximation scheme that gives a $(1+ε)$-optimal solution for this more general, NP-hard problem.
ROMay 6, 2013
Incremental Sampling-based Algorithm for Minimum-violation Motion PlanningLuis I. Reyes Castro, Pratik Chaudhari, Jana Tumova et al.
This paper studies the problem of control strategy synthesis for dynamical systems with differential constraints to fulfill a given reachability goal while satisfying a set of safety rules. Particular attention is devoted to goals that become feasible only if a subset of the safety rules are violated. The proposed algorithm computes a control law, that minimizes the level of unsafety while the desired goal is guaranteed to be reached. This problem is motivated by an autonomous car navigating an urban environment while following rules of the road such as "always travel in right lane'' and "do not change lanes frequently''. Ideas behind sampling based motion-planning algorithms, such as Probabilistic Road Maps (PRMs) and Rapidly-exploring Random Trees (RRTs), are employed to incrementally construct a finite concretization of the dynamics as a durational Kripke structure. In conjunction with this, a weighted finite automaton that captures the safety rules is used in order to find an optimal trajectory that minimizes the violation of safety rules. We prove that the proposed algorithm guarantees asymptotic optimality, i.e., almost-sure convergence to optimal solutions. We present results of simulation experiments and an implementation on an autonomous urban mobility-on-demand system.
ROMar 15, 2013
Minimum-violation LTL Planning with Conflicting SpecificationsJana Tumova, Luis I. Reyes Castro, Sertac Karaman et al.
We consider the problem of automatic generation of control strategies for robotic vehicles given a set of high-level mission specifications, such as "Vehicle x must eventually visit a target region and then return to a base," "Regions A and B must be periodically surveyed," or "None of the vehicles can enter an unsafe region." We focus on instances when all of the given specifications cannot be reached simultaneously due to their incompatibility and/or environmental constraints. We aim to find the least-violating control strategy while considering different priorities of satisfying different parts of the mission. Formally, we consider the missions given in the form of linear temporal logic formulas, each of which is assigned a reward that is earned when the formula is satisfied. Leveraging ideas from the automata-based model checking, we propose an algorithm for finding an optimal control strategy that maximizes the sum of rewards earned if this control strategy is applied. We demonstrate the proposed algorithm on an illustrative case study.