MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
This addresses the need for better geospatial AI to improve real-world navigation and autonomous tool usage, though it is incremental as it focuses on evaluation rather than novel model development.
The authors tackled the problem of evaluating geospatial reasoning in foundation models by introducing MapEval, a benchmark with 700 multiple-choice questions across 180 cities and 54 countries, and found that no model surpassed 67% accuracy, with all lagging over 20% behind human performance.
Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks - textual, API-based, and visual reasoning - through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions, and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro, none surpass 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation. All the resources are available at: https://mapeval.github.io/.