Jeremy Ma

RO
3papers
95citations
Novelty48%
AI Score24

3 Papers

ROSep 23, 2021
A Learned Stereo Depth System for Robotic Manipulation in Homes

Krishna Shankar, Mark Tjersland, Jeremy Ma et al.

We present a passive stereo depth system that produces dense and accurate point clouds optimized for human environments, including dark, textureless, thin, reflective and specular surfaces and objects, at 2560x2048 resolution, with 384 disparities, in 30 ms. The system consists of an algorithm combining learned stereo matching with engineered filtering, a training and data-mixing methodology, and a sensor hardware design. Our architecture is 15x faster than approaches that perform similarly on the Middlebury and Flying Things Stereo Benchmarks. To effectively supervise the training of this model, we combine real data labelled using off-the-shelf depth sensors, as well as a number of different rendered, simulated labeled datasets. We demonstrate the efficacy of our system by presenting a large number of qualitative results in the form of depth maps and point-clouds, experiments validating the metric accuracy of our system and comparisons to other sensors on challenging objects and scenes. We also show the competitiveness of our algorithm compared to state-of-the-art learned models using the Middlebury and FlyingThings datasets.

ROSep 30, 2019
A Mobile Manipulation System for One-Shot Teaching of Complex Tasks in Homes

Max Bajracharya, James Borders, Dan Helmick et al.

We describe a mobile manipulation hardware and software system capable of autonomously performing complex human-level tasks in real homes, after being taught the task with a single demonstration from a person in virtual reality. This is enabled by a highly capable mobile manipulation robot, whole-body task space hybrid position/force control, teaching of parameterized primitives linked to a robust learned dense visual embeddings representation of the scene, and a task graph of the taught behaviors. We demonstrate the robustness of the approach by presenting results for performing a variety of tasks, under different environmental conditions, in multiple real homes. Our approach achieves 85% overall success rate on three tasks that consist of an average of 45 behaviors each.

LGAug 14, 2018
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks

Satya Krishna Gorti, Jeremy Ma

Text-to-Image translation has been an active area of research in the recent past. The ability for a network to learn the meaning of a sentence and generate an accurate image that depicts the sentence shows ability of the model to think more like humans. Popular methods on text to image translation make use of Generative Adversarial Networks (GANs) to generate high quality images based on text input, but the generated images don't always reflect the meaning of the sentence given to the model as input. We address this issue by using a captioning network to caption on generated images and exploit the distance between ground truth captions and generated captions to improve the network further. We show extensive comparisons between our method and existing methods.