Lisa Peng

CV
3papers
137citations
Novelty73%
AI Score31

3 Papers

CVJun 22, 2022
Certifiable 3D Object Pose Estimation: Foundations, Learning Models, and Self-Training

Rajat Talak, Lisa Peng, Luca Carlone

We consider a certifiable object pose estimation problem, where -- given a partial point cloud of an object -- the goal is to not only estimate the object pose, but also to provide a certificate of correctness for the resulting estimate. Our first contribution is a general theory of certification for end-to-end perception models. In particular, we introduce the notion of $ζ$-correctness, which bounds the distance between an estimate and the ground truth. We show that $ζ$-correctness can be assessed by implementing two certificates: (i) a certificate of observable correctness, that asserts if the model output is consistent with the input data and prior information, (ii) a certificate of non-degeneracy, that asserts whether the input data is sufficient to compute a unique estimate. Our second contribution is to apply this theory and design a new learning-based certifiable pose estimator. We propose C-3PO, a semantic-keypoint-based pose estimation model, augmented with the two certificates, to solve the certifiable pose estimation problem. C-3PO also includes a keypoint corrector, implemented as a differentiable optimization layer, that can correct large detection errors (e.g. due to the sim-to-real gap). Our third contribution is a novel self-supervised training approach that uses our certificate of observable correctness to provide the supervisory signal to C-3PO during training. In it, the model trains only on the observably correct input-output pairs, in each training iteration. As training progresses, we see that the observably correct input-output pairs grow, eventually reaching near 100% in many cases. Our experiments show that (i) standard semantic-keypoint-based methods outperform more recent alternatives, (ii) C-3PO further improves performance and significantly outperforms all the baselines, and (iii) C-3PO's certificates are able to discern correct pose estimates.

ROAug 2, 2021
Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Zachary Ravichandran, Lisa Peng, Nathan Hughes et al.

Representations are crucial for a robot to learn effective navigation policies. Recent work has shown that mid-level perceptual abstractions, such as depth estimates or 2D semantic segmentation, lead to more effective policies when provided as observations in place of raw sensor data (e.g., RGB images). However, such policies must still learn latent three-dimensional scene properties from mid-level abstractions. In contrast, high-level, hierarchical representations such as 3D scene graphs explicitly provide a scene's geometry, topology, and semantics, making them compelling representations for navigation. In this work, we present a reinforcement learning framework that leverages high-level hierarchical representations to learn navigation policies. Towards this goal, we propose a graph neural network architecture and show how to embed a 3D scene graph into an agent-centric feature space, which enables the robot to learn policies for low-level action in an end-to-end manner. For each node in the scene graph, our method uses features that capture occupancy and semantic content, while explicitly retaining memory of the robot trajectory. We demonstrate the effectiveness of our method against commonly used visuomotor policies in a challenging object search task. These experiments and supporting ablation studies show that our method leads to more effective object search behaviors, exhibits improved long-term memory, and successfully leverages hierarchical information to guide its navigation objectives.

LGMay 15, 2021
Neural Trees for Learning on Graphs

Rajat Talak, Siyi Hu, Lisa Peng et al.

Graph Neural Networks (GNNs) have emerged as a flexible and powerful approach for learning over graphs. Despite this success, existing GNNs are constrained by their local message-passing architecture and are provably limited in their expressive power. In this work, we propose a new GNN architecture -- the Neural Tree. The neural tree architecture does not perform message passing on the input graph, but on a tree-structured graph, called the H-tree, that is constructed from the input graph. Nodes in the H-tree correspond to subgraphs in the input graph, and they are reorganized in a hierarchical manner such that the parent of a node in the H-tree always corresponds to a larger subgraph in the input graph. We show that the neural tree architecture can approximate any smooth probability distribution function over an undirected graph. We also prove that the number of parameters needed to achieve an $ε$-approximation of the distribution function is exponential in the treewidth of the input graph, but linear in its size. We prove that any continuous $\mathcal{G}$-invariant/equivariant function can be approximated by a nonlinear combination of such probability distribution functions over $\mathcal{G}$. We apply the neural tree to semi-supervised node classification in 3D scene graphs, and show that these theoretical properties translate into significant gains in prediction accuracy, over the more traditional GNN architectures. We also show the applicability of the neural tree architecture to citation networks with large treewidth, by using a graph sub-sampling technique.