AGJun 27, 2022
An Atlas for the Pinhole CameraSameer Agarwal, Timothy Duff, Max Lieblich et al.
We introduce an atlas of algebro-geometric objects associated with image formation in pinhole cameras. The nodes of the atlas are algebraic varieties or their vanishing ideals related to each other by projection or elimination and restriction or specialization respectively. This atlas offers a unifying framework for the study of problems in 3D computer vision. We initiate the study of the atlas by completely characterizing a part of the atlas stemming from the triangulation problem. We conclude with several open problems and generalizations of the atlas.
CLMar 8, 2024
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextGemini Team, Petko Georgiev, Ving Ian Lei et al. · deepmind, mila
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
AGMar 19, 2020
The Chiral Domain of a Camera ArrangementSameer Agarwal, Andrew Pryhuber, Rainer Sinn et al.
We introduce the chiral domain of an arrangement of cameras $\mathcal{A} = \{A_1,\dots, A_m\}$ which is the subset of $\mathbb{P}^3$ visible in $\mathcal{A}$. It generalizes the classical definition of chirality to include all of $\mathbb{P}^3$ and offers a unifying framework for studying multiview chirality. We give an algebraic description of the chiral domain which allows us to define and describe a chiral version of Triggs' joint image. We then use the chiral domain to re-derive and extend prior results on chirality due to Hartley.
CVOct 6, 2015
On the Existence of Epipolar MatricesSameer Agarwal, Hon-Leung Lee, Bernd Sturmfels et al.
This paper considers the foundational question of the existence of a fundamental (resp. essential) matrix given $m$ point correspondences in two views. We present a complete answer for the existence of fundamental matrices for any value of $m$. Using examples we disprove the widely held beliefs that fundamental matrices always exist whenever $m \leq 7$. At the same time, we prove that they exist unconditionally when $m \leq 5$. Under a mild genericity condition, we show that an essential matrix always exists when $m \leq 4$. We also characterize the six and seven point configurations in two views for which all matrices satisfying the epipolar constraint have rank at most one.
CVJul 21, 2014
Certifying the Existence of Epipolar MatricesSameer Agarwal, Hon-leung Lee, Bernd Sturmfels et al.
Given a set of point correspondences in two images, the existence of a fundamental matrix is a necessary condition for the points to be the images of a 3-dimensional scene imaged with two pinhole cameras. If the camera calibration is known then one requires the existence of an essential matrix. We present an efficient algorithm, using exact linear algebra, for testing the existence of a fundamental matrix. The input is any number of point correspondences. For essential matrices, we characterize the solvability of the Demazure polynomials. In both scenarios, we determine which linear subspaces intersect a fixed set defined by non-linear polynomials. The conditions we derive are polynomials stated purely in terms of image coordinates. They represent a new class of two-view invariants, free of fundamental (resp.~essential)~matrices.
CVMar 21, 2014
Continuous Optimization for Fields of Experts Denoising WorksPetter Strandmark, Sameer Agarwal
Several recent papers use image denoising with a Fields of Experts prior to benchmark discrete optimization methods. We show that a non-linear least squares solver significantly outperforms all known discrete methods on this problem.