Sofia Kirsanova

CV
h-index67
4papers
10citations
Novelty48%
AI Score37

4 Papers

CVDec 8, 2025
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Jiyoon Pyo, Yuankun Jiao, Dongwon Jung et al.

Cartographic reasoning is the skill of interpreting geographic relationships by aligning legends, map scales, compass directions, map texts, and geometries across one or more map images. Although essential as a concrete cognitive capability and for critical tasks such as disaster response and urban planning, it remains largely unevaluated. Building on progress in chart and infographic understanding, recent large vision language model studies on map visual question-answering often treat maps as a special case of charts. In contrast, map VQA demands comprehension of layered symbology (e.g., symbols, geometries, and text labels) as well as spatial relations tied to orientation and distance that often span multiple maps and are not captured by chart-style evaluations. To address this gap, we introduce FRIEDA, a benchmark for testing complex open-ended cartographic reasoning in LVLMs. FRIEDA sources real map images from documents and reports in various domains and geographical areas. Following classifications in Geographic Information System (GIS) literature, FRIEDA targets all three categories of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). All questions require multi-step inference, and many require cross-map grounding and reasoning. We evaluate eleven state-of-the-art LVLMs under two settings: (1) the direct setting, where we provide the maps relevant to the question, and (2) the contextual setting, where the model may have to identify the maps relevant to the question before reasoning. Even the strongest models, Gemini-2.5-Pro and GPT-5-Think, achieve only 38.20% and 37.20% accuracy, respectively, far below human performance of 84.87%. These results reveal a persistent gap in multi-step cartographic reasoning, positioning FRIEDA as a rigorous benchmark to drive progress on spatial intelligence in LVLMs.

CVJun 19, 2025
DIGMAPPER: A Modular System for Automated Geologic Map Digitization

Weiwei Duan, Michael P. Gerlek, Steven N. Minton et al.

Historical geologic maps contain rich geospatial information, such as rock units, faults, folds, and bedding planes, that is critical for assessing mineral resources essential to renewable energy, electric vehicles, and national security. However, digitizing maps remains a labor-intensive and time-consuming task. We present DIGMAPPER, a modular, scalable system developed in collaboration with the United States Geological Survey (USGS) to automate the digitization of geologic maps. DIGMAPPER features a fully dockerized, workflow-orchestrated architecture that integrates state-of-the-art deep learning models for map layout analysis, feature extraction, and georeferencing. To overcome challenges such as limited training data and complex visual content, our system employs innovative techniques, including in-context learning with large language models, synthetic data generation, and transformer-based models. Evaluations on over 100 annotated maps from the DARPA-USGS dataset demonstrate high accuracy across polygon, line, and point feature extraction, and reliable georeferencing performance. Deployed at USGS, DIGMAPPER significantly accelerates the creation of analysis-ready geospatial datasets, supporting national-scale critical mineral assessments and broader geoscientific applications.

LGOct 24, 2024
Context-Aware Trajectory Anomaly Detection

Haoji Hu, Jina Kim, Jinwei Zhou et al.

Trajectory anomaly detection is crucial for effective decision-making in urban and human mobility management. Existing methods of trajectory anomaly detection generally focus on training a trajectory generative model and evaluating the likelihood of reconstructing a given trajectory. However, previous work often lacks important contextual information on the trajectory, such as the agent's information (e.g., agent ID) or geographic information (e.g., Points of Interest (POI)), which could provide additional information on accurately capturing anomalous behaviors. To fill this gap, we propose a context-aware anomaly detection approach that models contextual information related to trajectories. The proposed method is based on a trajectory reconstruction framework guided by contextual factors such as agent ID and contextual POI embedding. The injection of contextual information aims to improve the performance of anomaly detection. We conducted experiments in two cities and demonstrated that the proposed approach significantly outperformed existing methods by effectively modeling contextual information. Overall, this paper paves a new direction for advancing trajectory anomaly detection.

CVOct 9, 2025
Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning

Sofia Kirsanova, Yao-Yi Chiang, Weiwei Duan

Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.