James Noraky

CLJul 7, 2025

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

IVFeb 2, 2020

Depth Map Estimation of Dynamic Scenes Using Prior Depth Information

James Noraky, Vivienne Sze

Depth information is useful for many applications. Active depth sensors are appealing because they obtain dense and accurate depth maps. However, due to issues that range from power constraints to multi-sensor interference, these sensors cannot always be continuously used. To overcome this limitation, we propose an algorithm that estimates depth maps using concurrently collected images and a previously measured depth map for dynamic scenes, where both the camera and objects in the scene may be independently moving. To estimate depth in these scenarios, our algorithm models the dynamic scene motion using independent and rigid motions. It then uses the previous depth map to efficiently estimate these rigid motions and obtain a new depth map. Our goal is to balance the acquisition of depth between the active depth sensor and computation, without incurring a large computational cost. Thus, we leverage the prior depth information to avoid computationally expensive operations like dense optical flow estimation or segmentation used in similar approaches. Our approach can obtain dense depth maps at up to real-time (30 FPS) on a standard laptop computer, which is orders of magnitude faster than similar approaches. When evaluated using RGB-D datasets of various dynamic scenes, our approach estimates depth maps with a mean relative error of 2.5% while reducing the active depth sensor usage by over 90%.

James Noraky

2 Papers