Hongyi Fan

CV
h-index2
6papers
55citations
Novelty46%
AI Score29

6 Papers

CVOct 4, 2023
Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC

Hongyi Fan, Joe Kileel, Benjamin Kimia

In this paper, we introduce a general framework for analyzing the numerical conditioning of minimal problems in multiple view geometry, using tools from computational algebra and Riemannian geometry. Special motivation comes from the fact that relative pose estimation, based on standard 5-point or 7-point Random Sample Consensus (RANSAC) algorithms, can fail even when no outliers are present and there is enough data to support a hypothesis. We argue that these cases arise due to the intrinsic instability of the 5- and 7-point minimal problems. We apply our framework to characterize the instabilities, both in terms of the world scenes that lead to infinite condition number, and directly in terms of ill-conditioned image data. The approach produces computational tests for assessing the condition number before solving the minimal problem. Lastly, synthetic and real data experiments suggest that RANSAC serves not only to remove outliers, but in practice it also selects for well-conditioned image data, which is consistent with our theory.

CVMar 30, 2025
Multiview Image-Based Localization

Cameron Fiore, Hongyi Fan, Benjamin Kimia

The image retrieval (IR) approach to image localization has distinct advantages to the 3D and the deep learning (DNN) approaches: it is seen-agnostic, simpler to implement and use, has no privacy issues, and is computationally efficient. The main drawback of this approach is relatively poor localization in both position and orientation of the query camera when compared to the competing approaches. This paper represents a hybrid approach that stores only image features in the database like some IR methods, but relies on a latent 3D reconstruction, like 3D methods but without retaining a 3D scene reconstruction. The approach is based on two ideas: {\em (i)} a novel proposal where query camera center estimation relies only on relative translation estimates but not relative rotation estimates through a decoupling of the two, and {\em (ii)} a shift from computing optimal pose from estimated relative pose to computing optimal pose from multiview correspondences, thus cutting out the ``middle-man''. Our approach shows improved performance on the 7-Scenes and Cambridge Landmarks datasets while also improving on timing and memory footprint as compared to state-of-the-art.

CVDec 29, 2021
On the Instability of Relative Pose Estimation and RANSAC's Role

Hongyi Fan, Joe Kileel, Benjamin Kimia

In this paper we study the numerical instabilities of the 5- and 7-point problems for essential and fundamental matrix estimation in multiview geometry. In both cases we characterize the ill-posed world scenes where the condition number for epipolar estimation is infinite. We also characterize the ill-posed instances in terms of the given image data. To arrive at these results, we present a general framework for analyzing the conditioning of minimal problems in multiview geometry, based on Riemannian manifolds. Experiments with synthetic and real-world data then reveal a striking conclusion: that Random Sample Consensus (RANSAC) in Structure-from-Motion (SfM) does not only serve to filter out outliers, but RANSAC also selects for well-conditioned image data, sufficiently separated from the ill-posed locus that our theory predicts. Our findings suggest that, in future work, one could try to accelerate and increase the success of RANSAC by testing only well-conditioned image data.

CVDec 24, 2021
Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD)

David Charatan, Hongyi Fan, Benjamin Kimia

We present the Brown Pedestrian Odometry Dataset (BPOD) for benchmarking visual odometry algorithms in head-mounted pedestrian settings. This dataset was captured using synchronized global and rolling shutter stereo cameras in 12 diverse indoor and outdoor locations on Brown University's campus. Compared to existing datasets, BPOD contains more image blur and self-rotation, which are common in pedestrian odometry but rare elsewhere. Ground-truth trajectories are generated from stick-on markers placed along the pedestrian's path, and the pedestrian's position is documented using a third-person video. We evaluate the performance of representative direct, feature-based, and learning-based VO methods on BPOD. Our results show that significant development is needed to successfully capture pedestrian trajectories. The link to the dataset is here: \url{https://doi.org/10.26300/c1n7-7p93

CVDec 7, 2021
GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision

Chiang-Heng Chien, Hongyi Fan, Ahmad Abdelfattah et al.

Systems of polynomial equations arise frequently in computer vision, especially in multiview geometry problems. Traditional methods for solving these systems typically aim to eliminate variables to reach a univariate polynomial, e.g., a tenth-order polynomial for 5-point pose estimation, using clever manipulations, or more generally using Grobner basis, resultants, and elimination templates, leading to successful algorithms for multiview geometry and other problems. However, these methods do not work when the problem is complex and when they do, they face efficiency and stability issues. Homotopy Continuation (HC) can solve more complex problems without the stability issues, and with guarantees of a global solution, but they are known to be slow. In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 26 times on polynomial benchmarks. We also show that GPU-HC can be generically applied to a range of computer vision problems, including 4-view triangulation and trifocal pose estimation with unknown focal length, which cannot be solved with elimination template but they can be efficiently solved with HC. GPU-HC opens the door to easy formulation and solution of a range of computer vision problems.

CVMar 23, 2019
Trifocal Relative Pose from Lines at Points and its Efficient Solution

Ricardo Fabbri, Timothy Duff, Hongyi Fan et al.

We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of i) three points and one line and the novel case of ii) three points and two lines through two of the points. These problems are too difficult to be efficiently solved by the state of the art Groebner basis methods. Our method is based on a new efficient homotopy continuation (HC) solver framework MINUS, which dramatically speeds up previous HC solving by specializing HC methods to generic cases of our problems. We characterize their number of solutions and show with simulated experiments that our solvers are numerically robust and stable under image noise, a key contribution given the borderline intractable degree of nonlinearity of trinocular constraints. We show in real experiments that i) SIFT feature location and orientation provide good enough point-and-line correspondences for three-view reconstruction and ii) that we can solve difficult cases with too few or too noisy tentative matches, where the state of the art structure from motion initialization fails.