CVJul 21, 2023
RIC: Rotate-Inpaint-Complete for Generalizable Scene ReconstructionIsaac Kasahara, Shubham Agrawal, Selim Engin et al. · apple-ml, cmu
General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes.
ROApr 14, 2023
EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural NetworksZiyun Wang, Fernando Cladera Ojeda, Anthony Bisulco et al.
Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.
CVAug 24, 2022
Apple Counting using Convolutional Neural NetworksNicolai Häni, Pravakar Roy, Volkan Isler
Estimating accurate and reliable fruit and vegetable counts from images in real-world settings, such as orchards, is a challenging problem that has received significant recent attention. Estimating fruit counts before harvest provides useful information for logistics planning. While considerable progress has been made toward fruit detection, estimating the actual counts remains challenging. In practice, fruits are often clustered together. Therefore, methods that only detect fruits fail to offer general solutions to estimate accurate fruit counts. Furthermore, in horticultural studies, rather than a single yield estimate, finer information such as the distribution of the number of apples per cluster is desirable. In this work, we formulate fruit counting from images as a multi-class classification problem and solve it by training a Convolutional Neural Network. We first evaluate the per-image accuracy of our method and compare it with a state-of-the-art method based on Gaussian Mixture Models over four test datasets. Even though the parameters of the Gaussian Mixture Model-based method are specifically tuned for each dataset, our network outperforms it in three out of four datasets with a maximum of 94\% accuracy. Next, we use the method to estimate the yield for two datasets for which we have ground truth. Our method achieved 96-97\% accuracies. For additional details please see our video here: https://www.youtube.com/watch?v=Le0mb5P-SYc}{https://www.youtube.com/watch?v=Le0mb5P-SYc.
ROOct 14, 2023
HIO-SDF: Hierarchical Incremental Online Signed Distance FieldsVasileios Vasilopoulos, Suveer Garg, Jinwook Huh et al.
A good representation of a large, complex mobile robot workspace must be space-efficient yet capable of encoding relevant geometric details. When exploring unknown environments, it needs to be updatable incrementally in an online fashion. We introduce HIO-SDF, a new method that represents the environment as a Signed Distance Field (SDF). State of the art representations of SDFs are based on either neural networks or voxel grids. Neural networks are capable of representing the SDF continuously. However, they are hard to update incrementally as neural networks tend to forget previously observed parts of the environment unless an extensive sensor history is stored for training. Voxel-based representations do not have this problem but they are not space-efficient especially in large environments with fine details. HIO-SDF combines the advantages of these representations using a hierarchical approach which employs a coarse voxel grid that captures the observed parts of the environment together with high-resolution local information to train a neural network. HIO-SDF achieves a 46% lower mean global SDF error across all test scenes than a state of the art continuous representation, and a 30% lower error than a discrete representation at the same resolution as our coarse global SDF grid. Videos and code are available at: https://samsunglabs.github.io/HIO-SDF-project-page/
CVSep 14, 2023
HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB ImageHongsuk Choi, Nikhil Chavan-Dafle, Jiacheng Yuan et al.
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as grasping and motion planning for robotic hand-over and manipulation. Homepage: https://samsunglabs.github.io/HandNeRF-project-page/
CVSep 28, 2022
Category-Level Global Camera Pose Estimation with Multi-Hypothesis Point Cloud CorrespondencesJun-Jee Chao, Selim Engin, Nicolai Häni et al.
Correspondence search is an essential step in rigid point cloud registration algorithms. Most methods maintain a single correspondence at each step and gradually remove wrong correspondances. However, building one-to-one correspondence with hard assignments is extremely difficult, especially when matching two point clouds with many locally similar features. This paper proposes an optimization method that retains all possible correspondences for each keypoint when matching a partial point cloud to a complete point cloud. These uncertain correspondences are then gradually updated with the estimated rigid transformation by considering the matching cost. Moreover, we propose a new point feature descriptor that measures the similarity between local point cloud regions. Extensive experiments show that our method outperforms the state-of-the-art (SoTA) methods even when matching different objects within the same category. Notably, our method outperforms the SoTA methods when registering real-world noisy depth images to a template shape by up to 20% performance.
ROApr 13
Uncertainty Guided Exploratory Trajectory Optimization for Sampling-Based Model Predictive ControlO. Goktug Poyrazoglu, Yukang Cao, Rahul Moorthy et al.
Trajectory optimization depends heavily on initialization. In particular, sampling-based approaches are highly sensitive to initial solutions, and limited exploration frequently leads them to converge to local minima in complex environments. We present Uncertainty Guided Exploratory Trajectory Optimization (UGE-TO), a trajectory optimization algorithm that generates well-separated samples to achieve a better coverage of the configuration space. UGE-TO represents trajectories as probability distributions induced by uncertainty ellipsoids. Unlike sampling-based approaches that explore only in the action space, this representation captures the effects of both system dynamics and action selection. By incorporating the impact of dynamics, in addition to the action space, into our distributions, our method enhances trajectory diversity by enforcing distributional separation via the Hellinger distance between them. It enables a systematic exploration of the configuration space and improves robustness against local minima. Further, we present UGE-MPC, which integrates UGE-TO into sampling-based model predictive controller methods. Experiments demonstrate that UGE-MPC achieves higher exploration and faster convergence in trajectory optimization compared to baselines under the same sampling budget, achieving 72.1% faster convergence in obstacle-free environments and 66% faster convergence with a 6.7% higher success rate in the cluttered environment compared to the best-performing baseline. Additionally, we validate the approach through a range of simulation scenarios and real-world experiments. Our results indicate that UGE-MPC has higher success rates and faster convergence, especially in environments that demand significant deviations from nominal trajectories to avoid failures. The project and code are available at https://ogpoyrazoglu.github.io/cuniform_sampling/.
CVFeb 24, 2023
3D Surface Reconstruction in the Wild by Deforming Shape Priors from Synthetic DataNicolai Häni, Jun-Jee Chao, Volkan Isler
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem that has received extensive attention from the computer vision community. Many learning-based approaches tackle this problem by learning a 3D shape prior from either ground truth 3D data or multi-view observations. To achieve state-of-the-art results, these methods assume that the objects are specified with respect to a fixed canonical coordinate frame, where instances of the same category are perfectly aligned. In this work, we present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image. We show that one can leverage shape priors learned on purely synthetic 3D data together with a point cloud pose canonicalization method to achieve high-quality 3D reconstruction in the wild. Given a single depth image at test time, we first transform this partial point cloud into a learned canonical frame. Then, we use a neural deformation field to reconstruct the 3D surface of the object. Finally, we jointly optimize object pose and 3D shape to fit the partial depth observation. Our approach achieves state-of-the-art reconstruction performance across several real-world datasets, even when trained only on synthetic data. We further show that our method generalizes to different input modalities, from dense depth images to sparse and noisy LIDAR scans.
ROSep 12, 2022
Self-supervised Wide Baseline Visual Servoing via 3D EquivarianceJinwook Huh, Jungseok Hong, Suveer Garg et al.
One of the challenging input settings for visual servoing is when the initial and goal camera views are far apart. Such settings are difficult because the wide baseline can cause drastic changes in object appearance and cause occlusions. This paper presents a novel self-supervised visual servoing method for wide baseline images which does not require 3D ground truth supervision. Existing approaches that regress absolute camera pose with respect to an object require 3D ground truth data of the object in the forms of 3D bounding boxes or meshes. We learn a coherent visual representation by leveraging a geometric property called 3D equivariance-the representation is transformed in a predictable way as a function of 3D transformation. To ensure that the feature-space is faithful to the underlying geodesic space, a geodesic preserving constraint is applied in conjunction with the equivariance. We design a Siamese network that can effectively enforce these two geometric properties without requiring 3D supervision. With the learned model, the relative transformation can be inferred simply by following the gradient in the learned space and used as feedback for closed-loop visual servoing. Our method is evaluated on objects from the YCB dataset, showing meaningful outperformance on a visual servoing task, or object alignment task with respect to state-of-the-art approaches that use 3D supervision. Ours yields more than 35% average distance error reduction and more than 90% success rate with 3cm error tolerance.
CVJan 1
SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian SplattingJun-Jee Chao, Volkan Isler
Reconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the temporal dimension, typically relying on multi-view videos captured at each time step. However, such setups are only possible in constrained environments. In real-world scenarios, observations are often sparse over time and captured sparsely from diverse viewpoints (e.g., from security cameras), making dynamic reconstruction highly ill-posed. We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations. To initialize SV-GS, we leverage a rough skeleton graph and an initial static reconstruction as inputs to guide motion estimation. (Later, we show that this input requirement can be relaxed.) Our method optimizes a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, our model enables smooth motion interpolation while preserving learned geometric details. Experiments on synthetic datasets show that our method outperforms existing approaches under sparse observations by up to 34% in PSNR, and achieves comparable performance to dense monocular video methods on real-world datasets despite using significantly fewer frames. Moreover, we demonstrate that the input initial static reconstruction can be replaced by a diffusion-based generative prior, making our method more practical for real-world scenarios.
CVNov 8, 2023
VioLA: Aligning Videos to 2D LiDAR ScansJun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle et al.
We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%.
CVDec 14, 2023
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control InjectionHongsuk Choi, Isaac Kasahara, Selim Engin et al.
Recently introduced ControlNet has the ability to steer the text-driven image generation process with geometric input such as human 2D pose, or edge features. While ControlNet provides control over the geometric form of the instances in the generated image, it lacks the capability to dictate the visual appearance of each instance. We present FineControlNet to provide fine control over each instance's appearance while maintaining the precise pose control capability. Specifically, we develop and demonstrate FineControlNet with geometric control via human pose images and appearance control via instance-level text prompts. The spatial alignment of instance-specific text prompts and 2D poses in latent space enables the fine control capabilities of FineControlNet. We evaluate the performance of FineControlNet with rigorous comparison against state-of-the-art pose-conditioned text-to-image diffusion models. FineControlNet achieves superior performance in generating images that follow the user-provided instance-specific text prompts and poses compared with existing methods. Project webpage: https://samsunglabs.github.io/FineControlNet-project-page
CVJun 28, 2025
Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D GaussiansJun-Jee Chao, Qingyuan Jiang, Volkan Isler
Part segmentation and motion estimation are two fundamental problems for articulated object motion analysis. In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object. The main challenge in our problem setting is that the point clouds are not assumed to be generated by a fixed set of moving points. Instead, each point cloud in the sequence could be an arbitrary sampling of the object surface at that particular time step. Such scenarios occur when the object undergoes major occlusions, or if the dataset is collected using measurements from multiple sensors asynchronously. In these scenarios, methods that rely on tracking point correspondences are not appropriate. We present an alternative approach based on a compact but effective representation where we represent the object as a collection of simple building blocks modeled as 3D Gaussians. We parameterize the Gaussians with time-dependent rotations, translations, and scales that are shared across all time steps. With our representation, part segmentation can be achieved by building correspondences between the observed points and the Gaussians. Moreover, the transformation of each point across time can be obtained by following the poses of the assigned Gaussian (even when the point is not observed). Experiments show that our method outperforms existing methods that solely rely on finding point correspondences. Additionally, we extend existing datasets to emulate real-world scenarios by considering viewpoint occlusions. We further demonstrate that our method is more robust to missing points as compared to existing approaches on these challenging datasets, even when some parts are completely occluded in some time-steps. Notably, our part segmentation performance outperforms the state-of-the-art method by 13% on point clouds with occlusions.
ROMay 16, 2023
Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp PredictionShubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara et al.
Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.
CVDec 1, 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using SoundZhijian Yang, Xiaoran Fan, Volkan Isler et al.
Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person. The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer function called pose kernel -- the impulse response of audio signals induced by the body pose. The main properties of the pose kernel are that (1) its envelope highly correlates with 3D pose, (2) the time response corresponds to arrival time, indicating the metric distance to the microphone, and (3) it is invariant to changes in the scene geometry configurations. Therefore, it is readily generalizable to unseen scenes. We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale. We show that our multi-modal method produces accurate metric reconstruction in real world scenes, which is not possible with state-of-the-art lifting approaches including parametric mesh regression and depth regression.
RONov 19, 2021
Online Coverage Planning for an Autonomous Weed Mowing Robot with Curvature ConstraintsParikshit Maini, Burak M. Gonultas, Volkan Isler
The land used for grazing cattle takes up about one-third of the land in the United States. These areas can be highly rugged. Yet, they need to be maintained to prevent weeds from taking over the nutritious grassland. This can be a daunting task especially in the case of organic farming since herbicides cannot be used. In this paper, we present the design of Cowbot, an autonomous weed mowing robot for pastures. Cowbot is an electric mower designed to operate in the rugged environments on cow pastures and provide a cost-effective method for weed control in organic farms. Path planning for the Cowbot is challenging since weed distribution on pastures is unknown. Given a limited field of view, online path planning is necessary to detect weeds and plan paths to mow them. We study the general online path planning problem for an autonomous mower with curvature and field of view constraints. We develop two online path planning algorithms that are able to utilize new information about weeds to optimize path length and ensure coverage. We deploy our algorithms on the Cowbot and perform field experiments to validate the suitability of our methods for real-time path planning. We also perform extensive simulation experiments which show that our algorithms result in up to 60 % reduction in path length as compared to baseline boustrophedon and random-search based coverage paths.
ROSep 15, 2021
ROW-SLAM: Under-Canopy Cornfield Semantic SLAMJiacheng Yuan, Jungseok Hong, Junaed Sattar et al.
We study a semantic SLAM problem faced by a robot tasked with autonomous weeding under the corn canopy. The goal is to detect corn stalks and localize them in a global coordinate frame. This is a challenging setup for existing algorithms because there is very little space between the camera and the plants, and the camera motion is primarily restricted to be along the row. To overcome these challenges, we present a multi-camera system where a side camera (facing the plants) is used for detection whereas front and back cameras are used for motion estimation. Next, we show how semantic features in the environment (corn stalks, ground, and crop planes) can be used to develop a robust semantic SLAM solution and present results from field trials performed throughout the growing season across various cornfields.
ROSep 14, 2021
Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell RepresentationNikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal et al.
Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
ROMar 20, 2021
Learning Continuous Cost-to-Go Functions for Non-holonomic SystemsJinwook Huh, Daniel D. Lee, Volkan Isler
This paper presents a supervised learning method to generate continuous cost-to-go functions of non-holonomic systems directly from the workspace description. Supervision from informative examples reduces training time and improves network performance. The manifold representing the optimal trajectories of a non-holonomic system has high-curvature regions which can not be efficiently captured with uniform sampling. To address this challenge, we present an adaptive sampling method which makes use of sampling-based planners along with local, closed-form solutions to generate training samples. The cost-to-go function over a specific workspace is represented as a neural network whose weights are generated by a second, higher order network. The networks are trained in an end-to-end fashion. In our previous work, this architecture was shown to successfully learn to generate the cost-to-go functions of holonomic systems using uniform sampling. In this work, we show that uniform sampling fails for non-holonomic systems. However, with the proposed adaptive sampling methodology, our network can generate near-optimal trajectories for non-holonomic systems while avoiding obstacles. Experiments show that our method is two orders of magnitude faster compared to traditional approaches in cluttered environments.
CVDec 27, 2020
Ellipse Regression with Predicted Uncertainties for Accurate Multi-View 3D Object EstimationWenbo Dong, Volkan Isler
Convolutional neural network (CNN) based architectures, such as Mask R-CNN, constitute the state of the art in object detection and segmentation. Recently, these methods have been extended for model-based segmentation where the network outputs the parameters of a geometric model (e.g. an ellipse) directly. This work considers objects whose three-dimensional models can be represented as ellipsoids. We present a variant of Mask R-CNN for estimating the parameters of ellipsoidal objects by segmenting each object and accurately regressing the parameters of projection ellipses. We show that model regression is sensitive to the underlying occlusion scenario and that prediction quality for each object needs to be characterized individually for accurate 3D object estimation. We present a novel ellipse regression loss which can learn the offset parameters with their uncertainties and quantify the overall geometric quality of detection for each ellipse. These values, in turn, allow us to fuse multi-view detections to obtain 3D ellipsoid parameters in a principled fashion. The experiments on both synthetic and real datasets quantitatively demonstrate the high accuracy of our proposed method in estimating 3D objects under heavy occlusions compared to previous state-of-the-art methods.
RODec 10, 2020
Cost-to-Go Function Generating Networks for High Dimensional Motion PlanningJinwook Huh, Volkan Isler, Daniel D. Lee
This paper presents c2g-HOF networks which learn to generate cost-to-go functions for manipulator motion planning. The c2g-HOF architecture consists of a cost-to-go function over the configuration space represented as a neural network (c2g-network) as well as a Higher Order Function (HOF) network which outputs the weights of the c2g-network for a given input workspace. Both networks are trained end-to-end in a supervised fashion using costs computed from traditional motion planners. Once trained, c2g-HOF can generate a smooth and continuous cost-to-go function directly from workspace sensor inputs (represented as a point cloud in 3D or an image in 2D). At inference time, the weights of the c2g-network are computed very efficiently and near-optimal trajectories are generated by simply following the gradient of the cost-to-go function. We compare c2g-HOF with traditional planning algorithms for various robots and planning scenarios. The experimental results indicate that planning with c2g-HOF is significantly faster than other motion planning algorithms, resulting in orders of magnitude improvement when including collision checking. Furthermore, despite being trained from sparsely sampled trajectories in configuration space, c2g-HOF generalizes to generate smoother, and often lower cost, trajectories. We demonstrate cost-to-go based planning on a 7 DoF manipulator arm where motion planning in a complex workspace requires only 0.13 seconds for the entire trajectory.
CVNov 18, 2020
Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision SensorsAnthony Bisulco, Fernando Cladera Ojeda, Volkan Isler et al.
This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high speed motion. As a representative scenario, we consider the case of a robot at rest reacting to a small, fast approaching object at speeds higher than 15m/s. Since conventional image sensors at typical frame rates observe such an object for only a few frames, estimating the underlying motion presents a considerable challenge for standard computer vision systems and algorithms. In this paper we present a method motivated by how animals such as insects solve this problem with their relatively simple vision systems. Our solution takes the event stream from a DVS and first encodes the temporal events with a set of causal exponential filters across multiple time scales. We couple these filters with a Convolutional Neural Network (CNN) to efficiently extract relevant spatiotemporal features. The combined network learns to output both the expected time to collision of the object, as well as the predicted collision point on a discretized polar grid. These critical estimates are computed with minimal delay by the network in order to react appropriately to the incoming object. We highlight the results of our system to a toy dart moving at 23.4m/s with a 24.73° error in $θ$, 18.4mm average discretized radius prediction error, and 25.03% median time to collision prediction error.
RONov 16, 2020
Multi-Step Recurrent Q-Learning for Robotic Velcro PeelingJiacheng Yuan, Nicolai Häni, Volkan Isler
Learning object manipulation is a critical skill for robots to interact with their environment. Even though there has been significant progress in robotic manipulation of rigid objects, interacting with non-rigid objects remains challenging for robots. In this work, we introduce velcro peeling as a representative application for robotic manipulation of non-rigid objects in complex environments. We present a method of learning force-based manipulation from noisy and incomplete sensor inputs in partially observable environments by modeling long term dependencies between measurements with a multi-step deep recurrent network. We present experiments on a real robot to show the necessity of modeling these long term dependencies and validate our approach in simulation and robot experiments. Our results show that using tactile input enables the robot to overcome geometric uncertainties present in the environment with high fidelity in ~90% of all cases, outperforming the baselines by a large margin.
ROOct 27, 2020
Learning to Generate Cost-to-Go Functions for Efficient Motion PlanningJinwook Huh, Galen Xing, Ziyun Wang et al.
Traditional motion planning is computationally burdensome for practical robots, involving extensive collision checking and considerable iterative propagation of cost values. We present a novel neural network architecture which can directly generate the cost-to-go (c2g) function for a given configuration space and a goal configuration. The output of the network is a continuous function whose gradient in configuration space can be directly used to generate trajectories in motion planning without the need for protracted iterations or extensive collision checking. This higher order function (i.e. a function generating another function) representation lies at the core of our motion planning architecture, c2g-HOF, which can take a workspace as input, and generate the cost-to-go function over the configuration space map (C-map). Simulation results for 2D and 3D environments show that c2g-HOF can be orders of magnitude faster at execution time than methods which explore the configuration space during execution. We also present an implementation of c2g-HOF which generates trajectories for robot manipulators directly from an overhead image of the workspace.
CVJul 30, 2020
Continuous Object Representation Networks: Novel View Synthesis without Target View SupervisionNicolai Häni, Selim Engin, Jun-Jee Chao et al.
Novel View Synthesis (NVS) is concerned with synthesizing views under camera viewpoint transformations from one or multiple input images. NVS requires explicit reasoning about 3D object structure and unseen parts of the scene to synthesize convincing results. As a result, current approaches typically rely on supervised training with either ground truth 3D models or multiple target images. We propose Continuous Object Representation Networks (CORN), a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation. We can train CORN with only two source images per object by combining our model with a neural renderer. A key feature of CORN is that it requires no ground truth 3D models or target view supervision. Regardless, CORN performs well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and achieves performance comparable to state-of-the-art approaches that use direct supervision. For up-to-date information, data, and code, please see our project page: https://nicolaihaeni.github.io/corn/.
CVJun 14, 2020
Geodesic-HOF: 3D Reconstruction Without Cutting CornersZiyun Wang, Eric A. Mitchell, Volkan Isler et al.
Single-view 3D object reconstruction is a challenging fundamental problem in computer vision, largely due to the morphological diversity of objects in the natural world. In particular, high curvature regions are not always captured effectively by methods trained using only set-based loss functions, resulting in reconstructions short-circuiting the surface or cutting corners. In particular, high curvature regions are not always captured effectively by methods trained using only set-based loss functions, resulting in reconstructions short-circuiting the surface or cutting corners. To address this issue, we propose learning an image-conditioned mapping function from a canonical sampling domain to a high dimensional space where the Euclidean distance is equal to the geodesic distance on the object. The first three dimensions of a mapped sample correspond to its 3D coordinates. The additional lifted components contain information about the underlying geodesic structure. Our results show that taking advantage of these learned lifted coordinates yields better performance for estimating surface normals and generating surfaces than using point cloud reconstructions alone. Further, we find that this learned geodesic embedding space provides useful information for applications such as unsupervised object decomposition.
CVApr 3, 2020
Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian DetectionAnthony Bisulco, Fernando Cladera Ojeda, Volkan Isler et al.
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs). We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm. Our system is composed of (i) a near-chip event filter that compresses and denoises the event stream from the DVS, and (ii) a Binary Neural Network (BNN) detection module that runs on a low-computation edge computing device (in our case a STM32F4 microcontroller). We present the system architecture and provide an end-to-end implementation for pedestrian detection in an office environment. Our implementation reduces transmission size by up to 99.6% compared to transmitting the raw event stream. The average packet size in our system is only 1397 bits, while 307.2 kb are required to send an uncompressed DVS time window. Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%. The low bandwidth and energy properties of our system make it ideal for IoT applications.
ROMar 3, 2020
Robotic Grasping through Combined Image-Based Grasp Proposal and 3D ReconstructionDaniel Yang, Tarik Tosun, Ben Eisner et al.
We present a novel approach to robotic grasp planning using both a learned grasp proposal network and a learned 3D shape reconstruction network. Our system generates 6-DOF grasps from a single RGB-D image of the target object, which is provided as input to both networks. By using the geometric reconstruction to refine the the candidate grasp produced by the grasp proposal network, our system is able to accurately grasp both known and unknown objects, even when the grasp location on the object is not visible in the input image. This paper presents the network architectures, training procedures, and grasp refinement method that comprise our system. Experiments demonstrate the efficacy of our system at grasping both known and unknown objects (91% success rate in a physical robot environment, 84% success rate in a simulated environment). We additionally perform ablation studies that show the benefits of combining a learned grasp proposal with geometric reconstruction for grasping, and also show that our system outperforms several baselines in a grasping task.
ROFeb 23, 2020
Active localization of multiple targets using noisy relative measurementsSelim Engin, Volkan Isler
Consider a mobile robot tasked with localizing targets at unknown locations by obtaining relative measurements. The observations can be bearing or range measurements. How should the robot move so as to localize the targets and minimize the uncertainty in their locations as quickly as possible? Most existing approaches are either greedy in nature or rely on accurate initial estimates. We formulate this path planning problem as an unsupervised learning problem where the measurements are aggregated using a Bayesian histogram filter. The robot learns to minimize the total uncertainty of each target in the shortest amount of time using the current measurement and an aggregate representation of the current belief state. We analyze our method in a series of experiments where we show that our method outperforms a standard greedy approach. In addition, its performance is also comparable to an offline algorithm which has access to the true location of the targets.
CVJan 30, 2020
Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and OcclusionWenbo Dong, Pravakar Roy, Cheng Peng et al.
Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object's geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects.
CVDec 18, 2019
Surface HOF: Surface Reconstruction from a Single Image Using Higher Order Function NetworksZiyun Wang, Volkan Isler, Daniel D. Lee
We address the problem of generating a high-resolution surface reconstruction from a single image. Our approach is to learn a Higher Order Function (HOF) which takes an image of an object as input and generates a mapping function. The mapping function takes samples from a canonical domain (e.g. the unit sphere) and maps each sample to a local tangent plane on the 3D reconstruction of the object. Each tangent plane is represented as an origin point and a normal vector at that point. By efficiently learning a continuous mapping function, the surface can be generated at arbitrary resolution in contrast to other methods which generate fixed resolution outputs. We present the Surface HOF in which both the higher order function and the mapping function are represented as neural networks, and train the networks to generate reconstructions of PointNet objects. Experiments show that Surface HOF is more accurate and uses more efficient representations than other state of the art methods for surface reconstruction. Surface HOF is also easier to train: it requires minimal input pre-processing and output post-processing and generates surface representations that are more parameter efficient. Its accuracy and convenience make Surface HOF an appealing method for single image reconstruction.
NIOct 13, 2019
QoS and Jamming-Aware Wireless Networking Using Deep Reinforcement LearningNof Abuzainab, Tugba Erpek, Kemal Davaslioglu et al.
The problem of quality of service (QoS) and jamming-aware communications is considered in an adversarial wireless network subject to external eavesdropping and jamming attacks. To ensure robust communication against jamming, an interference-aware routing protocol is developed that allows nodes to avoid communication holes created by jamming attacks. Then, a distributed cooperation framework, based on deep reinforcement learning, is proposed that allows nodes to assess network conditions and make deep learning-driven, distributed, and real-time decisions on whether to participate in data communications, defend the network against jamming and eavesdropping attacks, or jam other transmissions. The objective is to maximize the network performance that incorporates throughput, energy efficiency, delay, and security metrics. Simulation results show that the proposed jamming-aware routing approach is robust against jamming and when throughput is prioritized, the proposed deep reinforcement learning approach can achieve significant (measured as three-fold) increase in throughput, compared to a benchmark policy with fixed roles assigned to nodes.
ROOct 4, 2019
Higher Order Function Networks for View Planning and Multi-View ReconstructionSelim Engin, Eric Mitchell, Daewon Lee et al.
We consider the problem of planning views for a robot to acquire images of an object for visual inspection and reconstruction. In contrast to offline methods which require a 3D model of the object as input or online methods which rely on only local measurements, our method uses a neural network which encodes shape information for a large number of objects. We build on recent deep learning methods capable of generating a complete 3D reconstruction of an object from a single image. Specifically, in this work, we extend a recent method which uses Higher Order Functions (HOF) to represent the shape of the object. We present a new generalization of this method to incorporate multiple images as input and establish a connection between visibility and reconstruction quality. This relationship forms the foundation of our view planning method where we compute viewpoints to visually cover the output of the multi-view HOF network with as few images as possible. Experiments indicate that our method provides a good compromise between online and offline methods: Similar to online methods, our method does not require the true object model as input. In terms of number of views, it is much more efficient. In most cases, its performance is comparable to the optimal offline case even on object classes the network has not been trained on.
CVSep 13, 2019
MinneApple: A Benchmark Dataset for Apple Detection and SegmentationNicolai Häni, Pravakar Roy, Volkan Isler
In this work, we present a new dataset to advance the state-of-the-art in fruit detection, segmentation, and counting in orchard environments. While there has been significant recent interest in solving these problems, the lack of a unified dataset has made it difficult to compare results. We hope to enable direct comparisons by providing a large variety of high-resolution images acquired in orchards, together with human annotations of the fruit on trees. The fruits are labeled using polygonal masks for each object instance to aid in precise object detection, localization, and segmentation. Additionally, we provide data for patch-based counting of clustered fruits. Our dataset contains over 41, 000 annotated object instances in 1000 images. We present a detailed overview of the dataset together with baseline performance analysis for bounding box detection, segmentation, and fruit counting as well as representative results for yield estimation. We make this dataset publicly available and host a CodaLab challenge to encourage comparison of results on a common dataset. To download the data and learn more about MinneApple please see the project website: http://rsn.cs.umn.edu/index.php/MinneApple. Up to date information is available online.
ROAug 2, 2019
Asynchronous Network Formation in Unknown Unbounded EnvironmentsSelim Engin, Volkan Isler
In this paper, we study the Online Network Formation Problem (ONFP) for a mobile multi-robot system. Consider a group of robots with a bounded communication range operating in a large open area. One of the robots has a piece of information which has to be propagated to all other robots. What strategy should the robots pursue to disseminate the information to the rest of the robots as quickly as possible? The initial locations of the robots are unknown to each other, therefore the problem must be solved in an online fashion. For this problem, we present an algorithm whose competitive ratio is $O(H \cdot \max\{M,\sqrt{M H}\})$ for arbitrary robot deployments, where $M$ is the largest edge length in the Euclidean minimum spanning tree on the initial robot configuration and $H$ is the height of the tree. We also study the case when the robot initial positions are chosen uniformly at random and improve the ratio to $O(M)$. Finally, we present simulation results to validate the performance in larger scales and demonstrate our algorithm using three robots in a field experiment.
LGJul 24, 2019
Higher-Order Function Networks for Learning Composable 3D Object RepresentationsEric Mitchell, Selim Engin, Volkan Isler et al.
We present a new approach to 3D object representation where a neural network encodes the geometry of an object directly into the weights and biases of a second 'mapping' network. This mapping network can be used to reconstruct an object by applying its encoded transformation to points randomly sampled from a simple geometric space, such as the unit sphere. We study the effectiveness of our method through various experiments on subsets of the ShapeNet dataset. We find that the proposed approach can reconstruct encoded objects with accuracy equal to or exceeding state-of-the-art methods with orders of magnitude fewer parameters. Our smallest mapping network has only about 7000 parameters and shows reconstruction quality on par with state-of-the-art object decoder architectures with millions of parameters. Further experiments on feature mixing through the composition of learned functions show that the encoding captures a meaningful subspace of objects.
ROJul 15, 2019
Energy-efficient Path Planning for Ground Robots by Combining Air and Ground MeasurementsMinghan Wei, Volkan Isler
As mobile robots find increasing use in outdoor applications, designing energy-efficient robot navigation algorithms is gaining importance. There are two primary approaches to energy efficient navigation: Offline approaches rely on a previously built energy map as input to a path planner. Obtaining energy maps for large environments is challenging. Alternatively, the robot can navigate in an online fashion and build the map as it navigates. Online navigation in unknown environments with only local information is still a challenging research problem. In this paper, we present a novel approach which addresses both of these challenges. Our approach starts with a segmented aerial image of the environment. We show that a coarse energy map can be built from the segmentation. However, the absolute energy value for a specific terrain type (e.g. grass) can vary across environments. Therefore, rather than using this energy map directly, we use it to build the covariance function for a Gaussian Process (GP) based representation of the environment. In the online phase, energy measurements collected during navigation are used for estimating energy profiles across the environment using GP regression. Coupled with an A*-like navigation algorithm, we show in simulations that our approach outperforms representative baseline approaches. We also present results from field experiments which demonstrate the practical applicability of our method.
ROApr 5, 2019
Pixels to Plans: Learning Non-Prehensile Manipulation by Imitating a PlannerTarik Tosun, Eric Mitchell, Ben Eisner et al.
We present a novel method enabling robots to quickly learn to manipulate objects by leveraging a motion planner to generate "expert" training trajectories from a small amount of human-labeled data. In contrast to the traditional sense-plan-act cycle, we propose a deep learning architecture and training regimen called PtPNet that can estimate effective end-effector trajectories for manipulation directly from a single RGB-D image of an object. Additionally, we present a data collection and augmentation pipeline that enables the automatic generation of large numbers (millions) of training image and trajectory examples with almost no human labeling effort. We demonstrate our approach in a non-prehensile tool-based manipulation task, specifically picking up shoes with a hook. In hardware experiments, PtPNet generates motion plans (open-loop trajectories) that reliably (89% success over 189 trials) pick up four very different shoes from a range of positions and orientations, and reliably picks up a shoe it has never seen before. Compared with a traditional sense-plan-act paradigm, our system has the advantages of operating on sparse information (single RGB-D frame), producing high-quality trajectories much faster than the "expert" planner (300ms versus several seconds), and generalizing effectively to previously unseen shoes.
CVApr 3, 2019
Semantics-Aware Image to Image Translation and Domain TransferPravakar Roy, Nicolai Häni, Jun-Jee Chao et al.
Image to image translation is the problem of transferring an image from a source domain to a different (but related) target domain. We present a new unsupervised image to image translation technique that leverages the underlying semantic information for object transfiguration and domain transfer tasks. Specifically, we present a generative adversarial learning approach that jointly translates images and labels from a source domain to a target domain. Our main technical contribution is an encoder-decoder based network architecture that jointly encodes the image and its underlying semantics and translates both individually to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve semantic labels. Through extensive experimental evaluation, we demonstrate the effectiveness of our approach as compared to the state-of-the-art methods on unsupervised image-to-image translation, domain adaptation, and object transfiguration.
CVOct 22, 2018
A Comparative Study of Fruit Detection and Counting Methods for Yield Mapping in Apple OrchardsNicolai Häni, Pravakar Roy, Volkan Isler
We present new methods for apple detection and counting based on recent deep learning approaches and compare them with state-of-the-art results based on classical methods. Our goal is to quantify performance improvements by neural network-based methods compared to methods based on classical approaches. Additionally, we introduce a complete system for counting apples in an entire row. This task is challenging as it requires tracking fruits in images from both sides of the row. We evaluate the performances of three fruit detection methods and two fruit counting methods on six datasets. Results indicate that the classical detection approach still outperforms the deep learning based methods in the majority of the datasets. For fruit counting though, the deep learning based approach performs better for all of the datasets. Combining the classical detection method together with the neural network based counting approach, we achieve remarkable yield accuracies ranging from 95.56% to 97.83%.
ROAug 31, 2018
Semantic Mapping for Orchard Environments by Merging Two-Sides Reconstructions of Tree RowsWenbo Dong, Pravakar Roy, Volkan Isler
Measuring semantic traits for phenotyping is an essential but labor-intensive activity in horticulture. Researchers often rely on manual measurements which may not be accurate for tasks such as measuring tree volume. To improve the accuracy of such measurements and to automate the process, we consider the problem of building coherent three dimensional (3D) reconstructions of orchard rows. Even though 3D reconstructions of side views can be obtained using standard mapping techniques, merging the two side-views is difficult due to the lack of overlap between the two partial reconstructions. Our first main contribution in this paper is a novel method that utilizes global features and semantic information to obtain an initial solution aligning the two sides. Our mapping approach then refines the 3D model of the entire tree row by integrating semantic information common to both sides, and extracted using our novel robust detection and fitting algorithms. Next, we present a vision system to measure semantic traits from the optimized 3D model that is built from the RGB or RGB-D data captured by only a camera. Specifically, we show how canopy volume, trunk diameter, tree height and fruit count can be automatically obtained in real orchard environments. The experiment results from multiple datasets quantitatively demonstrate the high accuracy and robustness of our method.
CVAug 13, 2018
Vision-Based Preharvest Yield Mapping for Apple OrchardsPravakar Roy, Abhijeet Kislay, Patrick A. Plonski et al.
We present an end-to-end computer vision system for mapping yield in an apple orchard using images captured from a single camera. Our proposed system is platform independent and does not require any specific lighting conditions. Our main technical contributions are 1)~a semi-supervised clustering algorithm that utilizes colors to identify apples and 2)~an unsupervised clustering method that utilizes spatial properties to estimate fruit counts from apple clusters having arbitrarily complex geometry. Additionally, we utilize camera motion to merge the counts across multiple views. We verified the performance of our algorithms by conducting multiple field trials on three tree rows consisting of $252$ trees at the University of Minnesota Horticultural Research Center. Results indicate that the detection method achieves $F_1$-measure $.95 -.97$ for multiple color varieties and lighting conditions. The counting method achieves an accuracy of $89\%-98\%$. Additionally, we report merged fruit counts from both sides of the tree rows. Our yield estimation method achieves an overall accuracy of $91.98\% - 94.81\%$ across different datasets.
CVMay 1, 2018
Adaptive View Planning for Aerial 3D ReconstructionCheng Peng, Volkan Isler
With the proliferation of small aerial vehicles, acquiring close up aerial imagery for high quality reconstruction of complex scenes is gaining importance. We present an adaptive view planning method to collect such images in an automated fashion. We start by sampling a small set of views to build a coarse proxy to the scene. We then present (i)~a method that builds a view manifold for view selection, and (ii) an algorithm to select a sparse set of views. The vehicle then visits these viewpoints to cover the scene, and the procedure is repeated until reconstruction quality converges or a desired level of quality is achieved. The view manifold provides an effective efficiency/quality compromise between using the entire 6 degree of freedom pose space and using a single view hemisphere to select the views. Our results show that, in contrast to existing "explore and exploit" methods which collect only two sets of views, reconstruction quality can be drastically improved by adding a third set. They also indicate that three rounds of data collection is sufficient even for very complex scenes. We compare our algorithm to existing methods in three challenging scenes. We require each algorithm to select the same number of views. Our algorithm generates views which produce the least reconstruction error.
ROApr 25, 2018
Design and Evaluation of a Novel Cable-Driven Gripper with Perception Capabilities for Strawberry Picking RobotsYa Xiong, Pal Johan From, Volkan Isler
This paper presents a novel cable-driven gripper with perception capabilities for autonomous harvesting of strawberries. Experiments show that the gripper allows for more accurate and faster picking of strawberries compared to existing systems. The gripper consists of four functional parts for sensing, picking, transmission, and storing. It has six fingers that open to form a closed space to swallow a target strawberry and push other surrounding berries away from the target. Equipped with three IR sensors, the gripper controls a manipulator arm to correct for positional error, and can thus pick strawberries that are not exactly localized by the vision algorithm, improving the robustness. Experiments show that the gripper is gentle on the berries as it merely cuts the stem and there is no physical interaction with the berries during the cutting process. We show that the gripper has close-to-perfect successful picking rate when addressing isolated strawberries. By including internal perception, we get high positional error tolerance, and avoid using slow, high-level closed-loop control. Moreover, the gripper can store several berries, which reduces the overall travel distance for the manipulator, and decreases the time needed to pick a single strawberry substantially. The experiments show that the gripper design decreased picking execution time noticeably compared to results found in literature.
ROApr 16, 2018
Tree Morphology for Phenotyping from Semantics-Based Mapping in Orchard EnvironmentsWenbo Dong, Volkan Isler
Measuring tree morphology for phenotyping is an essential but labor-intensive activity in horticulture. Researchers often rely on manual measurements which may not be accurate for example when measuring tree volume. Recent approaches on automating the measurement process rely on LIDAR measurements coupled with high-accuracy GPS. Usually each side of a row is reconstructed independently and then merged using GPS information. Such approaches have two disadvantages: (1) they rely on specialized and expensive equipment, and (2) since the reconstruction process does not simultaneously use information from both sides, side reconstructions may not be accurate. We also show that standard loop closure methods do not necessarily align tree trunks well. In this paper, we present a novel vision system that employs only an RGB-D camera to estimate morphological parameters. A semantics-based mapping algorithm merges the two-sides 3D models of tree rows, where integrated semantic information is obtained and refined by robust fitting algorithms. We focus on measuring tree height, canopy volume and trunk diameter from the optimized 3D model. Experiments conducted in real orchards quantitatively demonstrate the accuracy of our method.
CVMar 31, 2017
View Selection with Geometric Uncertainty ModelingCheng Peng, Volkan Isler
Estimating positions of world points from features observed in images is a key problem in 3D reconstruction, image mosaicking,simultaneous localization and mapping and structure from motion. We consider a special instance in which there is a dominant ground plane $\mathcal{G}$ viewed from a parallel viewing plane $\mathcal{S}$ above it. Such instances commonly arise, for example, in aerial photography. Consider a world point $g \in \mathcal{G}$ and its worst case reconstruction uncertainty $\varepsilon(g,\mathcal{S})$ obtained by merging \emph{all} possible views of $g$ chosen from $\mathcal{S}$. We first show that one can pick two views $s_p$ and $s_q$ such that the uncertainty $\varepsilon(g,\{s_p,s_q\})$ obtained using only these two views is almost as good as (i.e. within a small constant factor of) $\varepsilon(g,\mathcal{S})$. Next, we extend the result to the entire ground plane $\mathcal{G}$ and show that one can pick a small subset of $\mathcal{S'} \subseteq \mathcal{S}$ (which grows only linearly with the area of $\mathcal{G}$) and still obtain a constant factor approximation, for every point $g \in \mathcal{G}$, to the minimum worst case estimate obtained by merging all views in $\mathcal{S}$. Finally, we present a multi-resolution view selection method which extends our techniques to non-planar scenes. We show that the method can produce rich and accurate dense reconstructions with a small number of views. Our results provide a view selection mechanism with provable performance guarantees which can drastically increase the speed of scene reconstruction algorithms. In addition to theoretical results, we demonstrate their effectiveness in an application where aerial imagery is used for monitoring farms and orchards.
CGJul 20, 2016
Minimizing Uncertainty through Sensor Placement with Angle ConstraintsIoana O. Bercea, Volkan Isler, Samir Khuller
We study the problem of sensor placement in environments in which localization is a necessity, such as ad-hoc wireless sensor networks that allow the placement of a few anchors that know their location or sensor arrays that are tracking a target. In most of these situations, the quality of localization depends on the relative angle between the target and the pair of sensors observing it. In this paper, we consider placing a small number of sensors which ensure good angular $α$-coverage: given $α$ in $[0,π/2]$, for each target location $t$, there must be at least two sensors $s_1$ and $s_2$ such that the $\angle(s_1 t s_2)$ is in the interval $[α, π-α]$. One of the main difficulties encountered in such problems is that since the constraints depend on at least two sensors, building a solution must account for the inherent dependency between selected sensors, a feature that generic Set Cover techniques do not account for. We introduce a general framework that guarantees an angular coverage that is arbitrarily close to $α$ for any $α<= π/3$ and apply it to a variety of problems to get bi-criteria approximations. When the angular coverage is required to be at least a constant fraction of $α$, we obtain results that are strictly better than what standard geometric Set Cover methods give. When the angular coverage is required to be at least $(1-1/δ)\cdotα$, we obtain a $\mathcal{O}(\log δ)$- approximation for sensor placement with $α$-coverage on the plane. In the presence of additional distance or visibility constraints, the framework gives a $\mathcal{O}(\logδ\cdot\log k_{OPT})$-approximation, where $k_{OPT}$ is the size of the optimal solution. We also use our framework to give a $\mathcal{O}(\log δ)$-approximation that ensures $(1-1/δ)\cdot α$-coverage and covers every target within distance $3R$.
CVMar 14, 2016
A Novel Method for the Extrinsic Calibration of a 2D Laser Rangefinder and a CameraWenbo Dong, Volkan Isler
We present a novel method for extrinsically calibrating a camera and a 2D Laser Rangefinder (LRF) whose beams are invisible from the camera image. We show that point-to-plane constraints from a single observation of a V-shaped calibration pattern composed of two non-coplanar triangles suffice to uniquely constrain the relative pose between two sensors. Next, we present an approach to obtain analytical solutions using point-to-plane constraints from single or multiple observations. Along the way, we also show that previous solutions, in contrast to our method, have inherent ambiguities and therefore must rely on a good initial estimate. Real and synthetic experiments validate our method and show that it achieves better accuracy than previous methods.
ROAug 26, 2015
Capturing an Omnidirectional Evader in Convex Environments using a Differential Drive RobotUbaldo Ruiz, Volkan Isler
We study the problem of capturing an Omnidirectional Evader in convex environments using a Differential Drive Robot (DDR). The DDR wins the game if at any time instant it captures (collides with) the evader. The evader wins if it can avoid capture forever. Both players are unit disks with the same maximum (bounded) speed, but the DDR can only change its motion direction at a bounded rate. We show that despite this limitation, the DDR can capture the evader.