Song Gao

LG
h-index39
42papers
1,184citations
Novelty37%
AI Score54

42 Papers

LGSep 24, 2024Code
Fine-Tuning is Fine, if Calibrated

Zheda Mai, Arpita Chowdhury, Ping Zhang et al. · microsoft-research

Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?" To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! {What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes}, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.

AIApr 13, 2023
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

Gengchen Mai, Weiming Huang, Jin Sun et al. · stanford

Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.

CVApr 10Code
NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track 2)

Lishen Qu, Yao Liu, Jie Liang et al.

This paper presents NTIRE 2026, the 3rd Restore Any Image Model (RAIM) challenge on multi-exposure image fusion in dynamic scenes. We introduce a benchmark that targets a practical yet difficult HDR imaging setting, where exposure bracketing must be fused under scene motion, illumination variation, and handheld camera jitter. The challenge data contains 100 training sequences with 7 exposure levels and 100 test sequences with 5 exposure levels, reflecting real-world scenarios that frequently cause misalignment and ghosting artefacts. We evaluate submissions with a leaderboard score derived from PSNR, SSIM, and LPIPS, while also considering perceptual quality, efficiency, and reproducibility during the final review. This track attracted 114 participating teams and received 987 submissions. The winning methods significantly improved the ability to remove artifacts from multi-exposure fusion and recover fine details. The dataset and the code of each team can be found at the repository: https://github.com/qulishen/RAIM-HDR.

LGNov 2, 2023
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data

Cheng-Hao Tu, Hong-You Chen, Zheda Mai et al. · microsoft-research

We propose a learning problem involving adapting a pre-trained source model to the target domain for classifying all classes that appeared in the source data, using target data that covers only a partial label space. This problem is practical, as it is unrealistic for the target end-users to collect data for all classes prior to adaptation. However, it has received limited attention in the literature. To shed light on this issue, we construct benchmark datasets and conduct extensive experiments to uncover the inherent challenges. We found a dilemma -- on the one hand, adapting to the new target domain is important to claim better performance; on the other hand, we observe that preserving the classification accuracy of classes missing in the target adaptation data is highly challenging, let alone improving them. To tackle this, we identify two key directions: 1) disentangling domain gradients from classification gradients, and 2) preserving class relationships. We present several effective solutions that maintain the accuracy of the missing classes and enhance the overall performance, establishing solid baselines for holistic transfer of pre-trained models with partial target data.

AIJun 1
Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

Steffen Knoblauch, Hao Li, Gengchen Mai et al.

Earth Observation (EO) has fundamentally transformed the monitoring of environmental processes and human activities up to planetary scale. Recent advances in self-supervised learning have given rise to Earth Observation Foundation Models (EOFMs), which leverage petabyte-scale unlabeled EO data to learn transferable representations across a wide range of downstream geospatial tasks. Despite these advances, current EOFMs remain largely confined to raster modalities, overlooking the rich, structured information encoded in openly-accessible vector data sources such as OpenStreetMap and Overture. Vector data provides explicit and compact representations of geographic entities, including geometry, topology, and semantic relationships, offering critical contextual signals that are often ambiguous or inaccessible in imagery alone. Raster and vector data thus represent complementary views of geographic space: raster data captures continuous physical and spectral patterns, while vector data encodes discrete objects and their relational structure and often represents more of the human rather than the physical systems (e.g. social or demographic data). However, existing geospatial representation learning paradigms treat these modalities in isolation, relying on imperfect and often lossy transformations to bridge them. This perspective paper calls for a paradigm shift toward joint Spatial Representation Learning (SRL) in an unified embedding space that integrate raster perception with vector-based reasoning. Building on emerging efforts in multimodal geospatial learning, we highlight conceptual foundations, technical challenges, and promising directions for aligning heterogeneous spatial data sources. We contend that such integration is essential for developing next-generation geospatial AI systems capable of more accurate, interpretable, and semantically grounded understanding of the Earth.

AISep 29, 2023
Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models

Jinmeng Rao, Song Gao, Gengchen Mai et al.

In recent years we have seen substantial advances in foundation models for artificial intelligence, including language, vision, and multimodal models. Recent studies have highlighted the potential of using foundation models in geospatial artificial intelligence, known as GeoAI Foundation Models, for geographic question answering, remote sensing image understanding, map generation, and location-based services, among others. However, the development and application of GeoAI foundation models can pose serious privacy and security risks, which have not been fully discussed or addressed to date. This paper introduces the potential privacy and security risks throughout the lifecycle of GeoAI foundation models and proposes a comprehensive blueprint for research directions and preventative and control strategies. Through this vision paper, we hope to draw the attention of researchers and policymakers in geospatial domains to these privacy and security risks inherent in GeoAI foundation models and advocate for the development of privacy-preserving and secure GeoAI foundation models.

CRMay 6Code
Secure Intellicise Wireless Network: Agentic AI for Coverless Semantic Steganography Communication

Rui Meng, Song Gao, Bingxuan Xu et al.

Semantic Communication (SemCom), leveraging its significant advantages in transmission efficiency and reliability, has emerged as a core technology for constructing future intellicise (intelligent and concise) wireless networks. However, intelligent attacks represented by semantic eavesdropping pose severe challenges to the security of SemCom. To address this challenge, Semantic Steganographic Communication (SemSteCom) achieves ``invisible'' encryption by implicitly embedding private semantic information into cover modality carriers. The state-of-the-art study has further introduced generative diffusion models to directly generate stega images without relying on original cover images, effectively enhancing steganographic capacity. Nevertheless, the recovery process of private images is highly dependent on the guidance of private semantic keys, which may be inferred by intelligent eavesdroppers, thereby introducing new security threats. To address this issue, we propose an Agentic AI-driven SemSteCom (AgentSemSteCom) scheme, which includes semantic extraction, digital token controlled reference image generation, coverless steganography, semantic codec, and optional task-oriented enhancement modules. The proposed AgentSemSteCom scheme obviates the need for both cover images and private semantic keys, thereby boosting steganographic capacity while reinforcing transmission security. The simulation results on open-source datasets verify that, AgentSemSteCom achieves better transmission quality and higher security levels than the baseline scheme.

LGMar 17, 2022
STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity

Yuhao Kang, Kunlin Wu, Song Gao et al.

Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.

LGSep 20, 2023
CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches

Jinmeng Rao, Song Gao, Sijia Zhu

The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience.

AIOct 20, 2022
GeoAI at ACM SIGSPATIAL: The New Frontier of Geospatial Artificial Intelligence Research

Dalton Lunga, Yingjie Hu, Shawn Newsam et al.

Geospatial Artificial Intelligence (GeoAI) is an interdisciplinary field enjoying tremendous adoption. However, the efficient design and implementation of GeoAI systems face many open challenges. This is mainly due to the lack of non-standardized approaches to artificial intelligence tool development, inadequate platforms, and a lack of multidisciplinary engagements, which all motivate domain experts to seek a shared stage with scientists and engineers to solve problems of significant impact on society. Since its inception in 2017, the GeoAI series of workshops has been co-located with the Association for Computing Machinery International Conference on Advances in Geographic Information Systems. The workshop series has fostered a nexus for geoscientists, computer scientists, engineers, entrepreneurs, and decision-makers, from academia, industry, and government to engage in artificial intelligence, spatiotemporal data computing, and geospatial data science research, motivated by various challenges. In this article, we revisit and discuss the state of GeoAI open research directions, the recent developments, and an emerging agenda calling for a continued cross-disciplinary community engagement.

GTApr 23
A Markovian Traffic Equilibrium Model for Ride-Hailing

Song Gao, Hanyu Cheng, Chiwei Yan et al.

We develop a Markovian traffic equilibrium model for ride-hailing in which vehicles, whether empty or hired, make sequential order-acceptance and link-choice decisions over a traffic network to maximize total discounted return in an infinite-horizon semi-Markov decision process. The model endogenizes both competition among empty vehicles for passenger demand and traffic congestion arising from road usage at the link level. We characterize equilibrium as the solution to a fixed-point system, establish its existence, and develop relaxed fixed-point iteration algorithms for equilibrium computation, with convergence results for specialized network structures. Computational experiments on realistic networks demonstrate the model's practical value for transportation planning. Ablation analyses reveal that ignoring either traffic congestion or drivers' forward-looking behavior can lead to potentially substantial biases in policy evaluation.

SIOct 10, 2022
Region2Vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions

Yunlei Liang, Jiawei Zhu, Wen Ye et al.

Community Detection algorithms are used to detect densely connected components in complex networks and reveal underlying relationships among components. As a special type of networks, spatial networks are usually generated by the connections among geographic regions. Identifying the spatial network communities can help reveal the spatial interaction patterns, understand the hidden regional structures and support regional development decision-making. Given the recent development of Graph Convolutional Networks (GCN) and its powerful performance in identifying multi-scale spatial interactions, we proposed an unsupervised GCN-based community detection method "region2vec" on spatial networks. Our method first generates node embeddings for regions that share common attributes and have intense spatial interactions, and then applies clustering algorithms to detect communities based on their embedding similarity and spatial adjacency. Experimental results show that while existing methods trade off either attribute similarities or spatial interactions for one another, "region2vec" maintains a great balance between both and performs the best when one wants to maximize both attribute similarities and spatial interactions within communities.

SIOct 9, 2022
Measuring Network Resilience via Geospatial Knowledge Graph: a Case Study of the US Multi-Commodity Flow Network

Jinmeng Rao, Song Gao, Michelle Miller et al.

Quantifying the resilience in the food system is important for food security issues. In this work, we present a geospatial knowledge graph (GeoKG)-based method for measuring the resilience of a multi-commodity flow network. Specifically, we develop a CFS-GeoKG ontology to describe geospatial semantics of a multi-commodity flow network comprehensively, and design resilience metrics that measure the node-level and network-level dependence of single-sourcing, distant, or non-adjacent suppliers/customers in food supply chains. We conduct a case study of the US state-level agricultural multi-commodity flow network with hierarchical commodity types. The results indicate that, by leveraging GeoKG, our method supports measuring both node-level and network-level resilience across space and over time and also helps discover concentration patterns of agricultural resources in the spatial network at different geographic scales.

CLJul 5, 2023
Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations

Yuhan Ji, Song Gao

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.

AIMar 19
Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography

Krzysztof Janowicz, Gengchen Mai, Rui Zhu et al.

Understanding how AI will represent and reason about geography should be a key concern for all of us, as the broader public increasingly interacts with spaces and places through these systems. Similarly, in line with the nature of foundation models, our own research often relies on pre-trained models. Hence, understanding what world AI systems construct is as important as evaluating their accuracy, including factual recall. To motivate the need for such studies, we provide three illustrative vignettes, i.e., exploratory probes, in the hope that they will spark lively discussions and follow-up work: (1) Do models form strong defaults, and how brittle are model outputs to minute syntactic variations? (2) Can distributional shifts resurface from the composition of individually benign tasks, e.g., when using AI systems to create personas? (3) Do we overlook deeper questions of understanding when solely focusing on the ability of systems to recall facts such as geographic principles?

LGOct 20, 2023
FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Yuxiao Qu, Jinmeng Rao, Song Gao et al.

Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenges and enhance the analysis of geospatial resilience of multicommodity food flow network, which is one type of spatial networks. FLEE-GNN addresses the limitations of current methodologies, such as entropy-based methods, in terms of generalizability, scalability, and data privacy. It combines the robustness and adaptability of graph neural networks with the privacy-conscious and decentralized aspects of federated learning on food supply network resilience analysis across geographical regions. This paper also discusses FLEE-GNN's innovative data generation techniques, experimental designs, and future directions for improvement. The results show the advancements of this approach to quantifying the resilience of multicommodity food flow networks, contributing to efforts towards ensuring global food security using AI methods. The developed FLEE-GNN has the potential to be applied in other spatial networks with spatially heterogeneous sub-network distributions.

LGJul 25, 2024
Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

Yatao Zhang, Yi Wang, Song Gao et al.

Human mobility is intricately influenced by urban contexts spatially and temporally, constituting essential domain knowledge in understanding traffic systems. While existing traffic forecasting models primarily rely on raw traffic data and advanced deep learning techniques, incorporating contextual information remains underexplored due to insufficient integration frameworks and the complexity of urban contexts. This study proposes a novel context-aware knowledge graph (CKG) framework to enhance traffic speed forecasting by effectively modeling spatial and temporal contexts. Employing a relation-dependent integration strategy, the framework generates context-aware representations from the spatial and temporal units of CKG to capture spatio-temporal dependencies of urban contexts. A CKG-GNN model, combining the CKG, dual-view multi-head self-attention (MHSA), and graph neural network (GNN), is then designed to predict traffic speed utilizing these context-aware representations. Our experiments demonstrate that CKG's configuration significantly influences embedding performance, with ComplEx and KG2E emerging as optimal for embedding spatial and temporal units, respectively. The CKG-GNN model establishes a benchmark for 10-120 min predictions, achieving average MAE, MAPE, and RMSE of $3.46\pm0.01$, $14.76\pm0.09\%$, and $5.08\pm0.01$, respectively. Compared to the baseline DCRNN model, integrating the spatial unit improves the MAE by 0.04 and the temporal unit by 0.13, while integrating both units further reduces it by 0.18. The dual-view MHSA analysis reveals the crucial role of relation-dependent features from the context-based view and the model's ability to prioritize recent time slots in prediction from the sequence-based view. Overall, this study underscores the importance of merging context-aware knowledge graphs with graph neural networks to improve traffic forecasting.

ROApr 21
Preparation and Motion Study of Magnetically Driven Micro Soft Robot Mimicking the Cownose Ray

Jiaqing Chang, Song Gao, Chaowei Dong et al.

In narrow, unstructured underwater environments such as environmental monitoring and minimally invasive medical procedures, micro soft robots exhibit unique advantages due to their flexible movement capabilities and small size. At the same time, applying bionic technology to the structural design of micro soft robots can significantly improve their swimming performance. However, limited by their miniaturization, these robots are difficult to power internally and usually adopt a wireless power supply method. This study designs and fabricates a magnetically responsive, cownose ray-inspired micro soft robot based on the swimming principle of the cownose ray. The robot is made of a certain proportion of NdFeB and PDMS. Then, a three-dimensional Helmholtz coil is used to generate an oscillating harmonic magnetic field to conduct swimming experiments on the robot, exploring the influence of magnetic field parameters on the robot's swimming performance. The experimental results show that the swimming speed is the fastest at B = 5 mT and f = 11 Hz, reaching 5.25 mm/s, which is about 0.5 body lengths per second. In addition, by adjusting the current direction and frequency of the coil, the robot can perform different swimming modes such as straight swimming, turning swimming, and directional swimming. By employing a stepwise adjustment method, the impact of response errors on the robot's trajectory can be effectively reduced. This study demonstrates a method for magnetically driven micro soft robots, laying a foundation for the application of wireless-driven robots in underwater narrow spaces.

SESep 7, 2025Code
GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation

Qianheng Zhang, Song Gao, Chen Wei et al.

Recent advances in large language models (LLMs) have fueled growing interest in automating geospatial analysis and GIS workflows, yet their actual capabilities remain uncertain. In this work, we call for rigorous evaluation of LLMs on well-defined geoprocessing tasks before making claims about full GIS automation. To this end, we present GeoAnalystBench, a benchmark of 50 Python-based tasks derived from real-world geospatial problems and carefully validated by GIS experts. Each task is paired with a minimum deliverable product, and evaluation covers workflow validity, structural alignment, semantic similarity, and code quality (CodeBLEU). Using this benchmark, we assess both proprietary and open source models. Results reveal a clear gap: proprietary models such as ChatGPT-4o-mini achieve high validity 95% and stronger code alignment (CodeBLEU 0.39), while smaller open source models like DeepSeek-R1-7B often generate incomplete or inconsistent workflows (48.5% validity, 0.272 CodeBLEU). Tasks requiring deeper spatial reasoning, such as spatial relationship detection or optimal site selection, remain the most challenging across all models. These findings demonstrate both the promise and limitations of current LLMs in GIS automation and provide a reproducible framework to advance GeoAI research with human-in-the-loop support.

SEMay 7, 2021Code
What do all these Buttons do? Statically Mining Android User Interfaces at Scale

Konstantin Kuznetsov, Chen Fu, Song Gao et al.

We introduce FRONTMATTER: a tool to automatically mine both user interface models and behavior of Android apps at a large scale with high precision. Given an app, FRONTMATTER statically extracts all declared screens, the user interface elements, their textual and graphical features, as well as Android APIs invoked by interacting with them. Executed on tens of thousands of real-world apps, FRONTMATTER opens the door for comprehensive mining of mobile user interfaces, jumpstarting empirical research at a large scale, addressing questions such as "How many travel apps require registration?", "Which apps do not follow accessibility guidelines?", "Does the user interface correspond to the description?", and many more. FRONTMATTER and the mined dataset are available under an open-source license.

HCDec 13, 2023
Artificial Intelligence Studies in Cartography: A Review and Synthesis of Methods, Applications, and Ethics

Yuhao Kang, Song Gao, Robert E. Roth

The past decade has witnessed the rapid development of geospatial artificial intelligence (GeoAI) primarily due to the ground-breaking achievements in deep learning and machine learning. A growing number of scholars from cartography have demonstrated successfully that GeoAI can accelerate previously complex cartographic design tasks and even enable cartographic creativity in new ways. Despite the promise of GeoAI, researchers and practitioners have growing concerns about the ethical issues of GeoAI for cartography. In this paper, we conducted a systematic content analysis and narrative synthesis of research studies integrating GeoAI and cartography to summarize current research and development trends regarding the usage of GeoAI for cartographic design. Based on this review and synthesis, we first identify dimensions of GeoAI methods for cartography such as data sources, data formats, map evaluations, and six contemporary GeoAI models, each of which serves a variety of cartographic tasks. These models include decision trees, knowledge graph and semantic web technologies, deep convolutional neural networks, generative adversarial networks, graph neural networks, and reinforcement learning. Further, we summarize seven cartographic design applications where GeoAI have been effectively employed: generalization, symbolization, typography, map reading, map interpretation, map analysis, and map production. We also raise five potential ethical challenges that need to be addressed in the integration of GeoAI for cartography: commodification, responsibility, privacy, bias, and (together) transparency, explainability, and provenance. We conclude by identifying four potential research directions for future cartographic research with GeoAI: GeoAI-enabled active cartographic symbolism, human-in-the-loop GeoAI for cartography, GeoAI-based mapping-as-a-service, and generative GeoAI for cartography.

LGNov 3, 2025
Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing

Song Gao, Shusen Jing, Shuai Zhang et al.

Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms.

CLAug 31, 2024
Evaluating the Effectiveness of Large Language Models in Representing and Understanding Movement Trajectories

Yuhan Ji, Song Gao

This research focuses on assessing the ability of AI foundation models in representing the trajectories of movements. We utilize one of the large language models (LLMs) (i.e., GPT-J) to encode the string format of trajectories and then evaluate the effectiveness of the LLM-based representation for trajectory data analysis. The experiments demonstrate that while the LLM-based embeddings can preserve certain trajectory distance metrics (i.e., the correlation coefficients exceed 0.74 between the Cosine distance derived from GPT-J embeddings and the Hausdorff and Dynamic Time Warping distances on raw trajectories), challenges remain in restoring numeric values and retrieving spatial neighbors in movement trajectory analytics. In addition, the LLMs can understand the spatiotemporal dependency contained in trajectories and have good accuracy in location prediction tasks. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using LLMs.

SINov 23, 2024
GeoAI-Enhanced Community Detection on Spatial Networks with Graph Deep Learning

Yunlei Liang, Jiawei Zhu, Wen Ye et al.

Spatial networks are useful for modeling geographic phenomena where spatial interaction plays an important role. To analyze the spatial networks and their internal structures, graph-based methods such as community detection have been widely used. Community detection aims to extract strongly connected components from the network and reveal the hidden relationships between nodes, but they usually do not involve the attribute information. To consider edge-based interactions and node attributes together, this study proposed a family of GeoAI-enhanced unsupervised community detection methods called region2vec based on Graph Attention Networks (GAT) and Graph Convolutional Networks (GCN). The region2vec methods generate node neural embeddings based on attribute similarity, geographic adjacency and spatial interactions, and then extract network communities based on node embeddings using agglomerative clustering. The proposed GeoAI-based methods are compared with multiple baselines and perform the best when one wants to maximize node attribute similarity and spatial interaction intensity simultaneously within the spatial network communities. It is further applied in the shortage area delineation problem in public health and demonstrates its promise in regionalization problems.

CYApr 28
Geographic Bias and Diversity in AI Evaluation

Zilong Liu, Krzysztof Janowicz, Gengchen Mai et al.

Among the many challenges hindering the responsible development and deployment of AI, arguably none has faced more intense scrutiny than bias in its various forms. This underscores the widespread concerns across AI researchers that model outputs, e.g., from generative AI, may encode structural distributional imbalances (stemming from training data or model design) that may amplify social inequality or introduce systemic distortions across application domains ranging from biodiversity to disaster mitigation. Yet, relatively little work has investigated the geographical nature of bias or developed measurable benchmarks for what it means for (generative) AI to be unbiased. In this chapter, we investigate this issue through a literature review. As foundation models are reshaping the landscape of bias research, we examine work spanning both the pre-generative AI and generative AI periods. First, we identify a range of geographic biases. These biases span from representation bias in the training data and regional disparities in the factual recall of language models to the tendency of generative AI to over-proportionally favor prototypical places (called defaults). Then, we showcase how recent studies address the latter bias by evaluating geographic diversity in the outputs of generative AI across various cognitive levels, parameter settings, and output modalities.

AIMar 31, 2025
GIScience in the Era of Artificial Intelligence: A Research Agenda Towards Autonomous GIS

Zhenlong Li, Huan Ning, Song Gao et al.

The advent of generative AI exemplified by large language models (LLMs) opens new ways to represent and compute geographic information and transcends the process of geographic knowledge production, driving geographic information systems (GIS) towards autonomous GIS. Leveraging LLMs as the decision core, autonomous GIS can independently generate and execute geoprocessing workflows to perform spatial analysis. In this vision paper, we further elaborate on the concept of autonomous GIS and present a conceptual framework that defines its five autonomous goals, five autonomous levels, five core functions, and three operational scales. We demonstrate how autonomous GIS could perform geospatial data retrieval, spatial analysis, and map making with four proof-of-concept GIS agents. We conclude by identifying critical challenges and future research directions, including fine-tuning and self-growing decision-cores, autonomous modeling, and examining the societal and practical implications of autonomous GIS. By establishing the groundwork for a paradigm shift in GIScience, this paper envisions a future where GIS moves beyond traditional workflows to autonomously reason, derive, innovate, and advance geospatial solutions to pressing global challenges. Meanwhile, as we design and deploy increasingly intelligent geospatial systems, we carry a responsibility to ensure they are developed in socially responsible ways, serve the public good, and support the continued value of human geographic insight in an AI-augmented future.

CLMay 22, 2025
Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations

Yuhan Ji, Song Gao, Ying Nie et al.

Applying AI foundation models directly to geospatial datasets remains challenging due to their limited ability to represent and reason with geographical entities, specifically vector-based geometries and natural language descriptions of complex spatial relations. To address these issues, we investigate the extent to which a well-known-text (WKT) representation of geometries and their spatial relations (e.g., topological predicates) are preserved during spatial reasoning when the geospatial vector data are passed to large language models (LLMs) including GPT-3.5-turbo, GPT-4, and DeepSeek-R1-14B. Our workflow employs three distinct approaches to complete the spatial reasoning tasks for comparison, i.e., geometry embedding-based, prompt engineering-based, and everyday language-based evaluation. Our experiment results demonstrate that both the embedding-based and prompt engineering-based approaches to geospatial question-answering tasks with GPT models can achieve an accuracy of over 0.6 on average for the identification of topological spatial relations between two geometries. Among the evaluated models, GPT-4 with few-shot prompting achieved the highest performance with over 0.66 accuracy on topological spatial relation inference. Additionally, GPT-based reasoner is capable of properly comprehending inverse topological spatial relations and including an LLM-generated geometry can enhance the effectiveness for geographic entity retrieval. GPT-4 also exhibits the ability to translate certain vernacular descriptions about places into formal topological relations, and adding the geometry-type or place-type context in prompts may improve inference accuracy, but it varies by instance. The performance of these spatial reasoning tasks offers valuable insights for the refinement of LLMs with geographical knowledge towards the development of geo-foundation models capable of geospatial reasoning.

CVDec 29, 2025
Physics-Inspired Modeling and Content Adaptive Routing in an Infrared Gas Leak Detection Network

Dongsheng Li, Tianli Ma, Siling Wang et al.

Detecting infrared gas leaks is critical for environmental monitoring and industrial safety, yet remains difficult because plumes are faint, small, semitransparent, and have weak, diffuse boundaries. We present physics-edge hybrid gas dynamic routing network (PEG-DRNet). First, we introduce the Gas Block, a diffusion-convection unit modeling gas transport: a local branch captures short-range variations, while a large-kernel branch captures long-range propagation. An edge-gated learnable fusion module balances local detail and global context, strengthening weak-contrast plume and contour cues. Second, we propose the adaptive gradient and phase edge operator (AGPEO), computing reliable edge priors from multi-directional gradients and phase-consistent responses. These are transformed by a multi-scale edge perception module (MSEPM) into hierarchical edge features that reinforce boundaries. Finally, the content-adaptive sparse routing path aggregation network (CASR-PAN), with adaptive information modulation modules for fusion and self, selectively propagates informative features across scales based on edge and content cues, improving cross-scale discriminability while reducing redundancy. Experiments on the IIG dataset show that PEG-DRNet achieves an overall AP of 29.8\%, an AP$_{50}$ of 84.3\%, and a small-object AP of 25.3\%, surpassing the RT-DETR-R18 baseline by 3.0\%, 6.5\%, and 5.3\%, respectively, while requiring only 43.7 Gflops and 14.9 M parameters. The proposed PEG-DRNet achieves superior overall performance with the best balance of accuracy and computational efficiency, outperforming existing CNN and Transformer detectors in AP and AP$_{50}$ on the IIG and LangGas dataset.

CRApr 27, 2025
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models

Weidi Luo, Tianyu Lu, Qiming Zhang et al.

Recent advances in multi-modal large reasoning models (MLRMs) have shown significant ability to interpret complex visual content. While these models enable impressive reasoning capabilities, they also introduce novel and underexplored privacy risks. In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation information, such as a user's home address or neighborhood, from user-generated images, including selfies captured in private settings. To formalize and evaluate these risks, we propose a three-level visual privacy risk framework that categorizes image content based on contextual sensitivity and potential for location inference. We further introduce DoxBench, a curated dataset of 500 real-world images reflecting diverse privacy scenarios. Our evaluation across 11 advanced MLRMs and MLLMs demonstrates that these models consistently outperform non-expert humans in geolocation inference and can effectively leak location-related private information. This significantly lowers the barrier for adversaries to obtain users' sensitive geolocation information. We further analyze and identify two primary factors contributing to this vulnerability: (1) MLRMs exhibit strong reasoning capabilities by leveraging visual clues in combination with their internal world knowledge; and (2) MLRMs frequently rely on privacy-related visual clues for inference without any built-in mechanisms to suppress or avoid such usage. To better understand and demonstrate real-world attack feasibility, we propose GeoMiner, a collaborative attack framework that decomposes the prediction process into two stages: clue extraction and reasoning to improve geolocation performance while introducing a novel attack perspective. Our findings highlight the urgent need to reassess inference-time privacy risks in MLRMs to better protect users' sensitive information.

CVDec 31, 2023
Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs

Vardaan Pahuja, Weidi Luo, Yu Gu et al.

Camera traps are important tools in animal ecology for biodiversity monitoring and conservation. However, their practical application is limited by issues such as poor generalization to new and unseen locations. Images are typically associated with diverse forms of context, which may exist in different modalities. In this work, we exploit the structured context linked to camera trap images to boost out-of-distribution generalization for species classification tasks in camera traps. For instance, a picture of a wild animal could be linked to details about the time and place it was captured, as well as structured biological knowledge about the animal species. While often overlooked by existing studies, incorporating such context offers several potential benefits for better image understanding, such as addressing data scarcity and enhancing generalization. However, effectively incorporating such heterogeneous context into the visual domain is a challenging problem. To address this, we propose a novel framework that transforms species classification as link prediction in a multimodal knowledge graph (KG). This framework enables the seamless integration of diverse multimodal contexts for visual recognition. We apply this framework for out-of-distribution species classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets and achieve competitive performance with state-of-the-art approaches. Furthermore, our framework enhances sample efficiency for recognizing under-represented species.

LGJun 9, 2025
AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

Yifei Li, Hanane Nour Moussa, Ziru Chen et al.

Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle this data scarcity issue, we present AutoSDT, an automatic pipeline that collects high-quality coding tasks in real-world data-driven discovery workflows. AutoSDT leverages the coding capabilities and parametric knowledge of LLMs to search for diverse sources, select ecologically valid tasks, and synthesize accurate task instructions and code solutions. Using our pipeline, we construct AutoSDT-5K, a dataset of 5,404 coding tasks for data-driven discovery that covers four scientific disciplines and 756 unique Python packages. To the best of our knowledge, AutoSDT-5K is the only automatically collected and the largest open dataset for data-driven scientific discovery. Expert feedback on a subset of 256 tasks shows the effectiveness of AutoSDT: 93% of the collected tasks are ecologically valid, and 92.2% of the synthesized programs are functionally correct. Trained on AutoSDT-5K, the Qwen2.5-Coder-Instruct LLM series, dubbed AutoSDT-Coder, show substantial improvement on two challenging data-driven discovery benchmarks, ScienceAgentBench and DiscoveryBench. Most notably, AutoSDT-Coder-32B reaches the same level of performance as GPT-4o on ScienceAgentBench with a success rate of 7.8%, doubling the performance of its base model. On DiscoveryBench, it lifts the hypothesis matching score to 8.1, bringing a 17.4% relative improvement and closing the gap between open-weight models and GPT-4o.

AIDec 14, 2023
Artificial Intelligence and Human Geography

Song Gao

This paper examines the recent advances and applications of AI in human geography especially the use of machine (deep) learning, including place representation and modeling, spatial analysis and predictive mapping, and urban planning and design. AI technologies have enabled deeper insights into complex human-environment interactions, contributing to more effective scientific exploration, understanding of social dynamics, and spatial decision-making. Furthermore, human geography offers crucial contributions to AI, particularly in context-aware model development, human-centered design, biases and ethical considerations, and data privacy. The synergy beween AI and human geography is essential for addressing global challenges like disaster resilience, poverty, and equitable resource access. This interdisciplinary collaboration between AI and geography will help advance the development of GeoAI and promise a better and sustainable world for all.

CVMay 19, 2025
GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

AIAug 7, 2025
Whose Truth? Pluralistic Geo-Alignment for (Agentic) AI

Krzysztof Janowicz, Zilong Liu, Gengchen Mai et al.

AI (super) alignment describes the challenge of ensuring (future) AI systems behave in accordance with societal norms and goals. While a quickly evolving literature is addressing biases and inequalities, the geographic variability of alignment remains underexplored. Simply put, what is considered appropriate, truthful, or legal can differ widely across regions due to cultural norms, political realities, and legislation. Alignment measures applied to AI/ML workflows can sometimes produce outcomes that diverge from statistical realities, such as text-to-image models depicting balanced gender ratios in company leadership despite existing imbalances. Crucially, some model outputs are globally acceptable, while others, e.g., questions about Kashmir, depend on knowing the user's location and their context. This geographic sensitivity is not new. For instance, Google Maps renders Kashmir's borders differently based on user location. What is new is the unprecedented scale and automation with which AI now mediates knowledge, expresses opinions, and represents geographic reality to millions of users worldwide, often with little transparency about how context is managed. As we approach Agentic AI, the need for spatio-temporally aware alignment, rather than one-size-fits-all approaches, is increasingly urgent. This paper reviews key geographic research problems, suggests topics for future work, and outlines methods for assessing alignment sensitivity.

CVDec 12, 2023
DTA: Distribution Transform-based Attack for Query-Limited Scenario

Renyang Liu, Wei Zhou, Xin Jin et al.

In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., hard-label) to the client and a system equipped with certain defense mechanisms could easily detect malicious queries. By contrast, a feasible way is a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries. To implement this idea, in this paper, we bypass the dependency on the to-be-attacked model and benefit from the characteristics of the distributions of adversarial examples to reformulate the attack problem in a distribution transform manner and propose a distribution transform-based attack (DTA). DTA builds a statistical mapping from the benign example to its adversarial counterparts by tackling the conditional likelihood under the hard-label black-box settings. In this way, it is no longer necessary to query the target model frequently. A well-trained DTA model can directly and efficiently generate a batch of adversarial examples for a certain input, which can be used to attack un-seen models based on the assumed transferability. Furthermore, we surprisingly find that the well-trained DTA model is not sensitive to the semantic spaces of the training dataset, meaning that the model yields acceptable attack performance on other datasets. Extensive experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.

CVFeb 7, 2022
DeepSSN: a deep convolutional neural network to assess spatial scene similarity

Danhuai Guo, Shiyin Ge, Shu Zhang et al.

Spatial-query-by-sketch is an intuitive tool to explore human spatial knowledge about geographic environments and to support communication with scene database queries. However, traditional sketch-based spatial search methods perform insufficiently due to their inability to find hidden multi-scale map features from mental sketches. In this research, we propose a deep convolutional neural network, namely Deep Spatial Scene Network (DeepSSN), to better assess the spatial scene similarity. In DeepSSN, a triplet loss function is designed as a comprehensive distance metric to support the similarity assessment. A positive and negative example mining strategy using qualitative constraint networks in spatial reasoning is designed to ensure a consistently increasing distinction of triplets during the training process. Moreover, we develop a prototype spatial scene search system using the proposed DeepSSN, in which the users input spatial query via sketch maps and the system can automatically augment the sketch training data. The proposed model is validated using multi-source conflated map data including 131,300 labeled scene samples after data augmentation. The empirical results demonstrate that the DeepSSN outperforms baseline methods including k-nearest-neighbors, multilayer perceptron, AlexNet, DenseNet, and ResNet using mean reciprocal rank and precision metrics. This research advances geographic information retrieval studies by introducing a novel deep learning method tailored to spatial scene queries.

LGNov 7, 2021
A Review of Location Encoding for GeoAI: Methods and Applications

Gengchen Mai, Krzysztof Janowicz, Yingjie Hu et al.

A common need for artificial intelligence models in the broader geoscience is to represent and encode various types of spatial data, such as points (e.g., points of interest), polylines (e.g., trajectories), polygons (e.g., administrative regions), graphs (e.g., transportation networks), or rasters (e.g., remote sensing images), in a hidden embedding space so that they can be readily incorporated into deep learning models. One fundamental step is to encode a single point location into an embedding space, such that this embedding is learning-friendly for downstream machine learning models such as support vector machines and neural networks. We call this process location encoding. However, there lacks a systematic review on the concept of location encoding, its potential applications, and key challenges that need to be addressed. This paper aims to fill this gap. We first provide a formal definition of location encoding, and discuss the necessity of location encoding for GeoAI research from a machine learning perspective. Next, we provide a comprehensive survey and discussion about the current landscape of location encoding research. We classify location encoding models into different categories based on their inputs and encoding methods, and compare them based on whether they are parametric, multi-scale, distance preserving, and direction aware. We demonstrate that existing location encoding models can be unified under a shared formulation framework. We also discuss the application of location encoding for different types of spatial data. Finally, we point out several challenges in location encoding research that need to be solved in the future.

LGJun 14, 2020
LSTM-TrajGAN: A Deep Learning Approach to Trajectory Privacy Protection

Jinmeng Rao, Song Gao, Yuhao Kang et al.

The prevalence of location-based services contributes to the explosive growth of individual-level trajectory data and raises public concerns about privacy issues. In this research, we propose a novel LSTM-TrajGAN approach, which is an end-to-end deep learning model to generate privacy-preserving synthetic trajectory data for data sharing and publication. We design a loss metric function TrajLoss to measure the trajectory similarity losses for model training and optimization. The model is evaluated on the trajectory-user-linking task on a real-world semantic trajectory dataset. Compared with other common geomasking methods, our model can better prevent users from being re-identified, and it also preserves essential spatial, temporal, and thematic characteristics of the real trajectory data. The model better balances the effectiveness of trajectory privacy protection and the utility for spatial and temporal analyses, which offers new insights into the GeoAI-powered privacy protection.

CVMay 6, 2019
Transferring Multiscale Map Styles Using Generative Adversarial Networks

Yuhao Kang, Song Gao, Robert E. Roth

The advancement of the Artificial Intelligence (AI) technologies makes it possible to learn stylistic design criteria from existing maps or other visual art and transfer these styles to make new digital maps. In this paper, we propose a novel framework using AI for map style transfer applicable across multiple map scales. Specifically, we identify and transfer the stylistic elements from a target group of visual examples, including Google Maps, OpenStreetMap, and artistic paintings, to unstylized GIS vector data through two generative adversarial network (GAN) models. We then train a binary classifier based on a deep convolutional neural network to evaluate whether the transfer styled map images preserve the original map design characteristics. Our experiment results show that GANs have a great potential for multiscale map style transferring, but many challenges remain requiring future research.

CVMay 6, 2019
Extracting human emotions at different places based on facial expressions and spatial clustering analysis

Yuhao Kang, Qingyuan Jia, Song Gao et al.

The emergence of big data enables us to evaluate the various human emotions at places from a statistic perspective by applying affective computing. In this study, a novel framework for extracting human emotions from large-scale georeferenced photos at different places is proposed. After the construction of places based on spatial clustering of user generated footprints collected in social media websites, online cognitive services are utilized to extract human emotions from facial expressions using the state-of-the-art computer vision techniques. And two happiness metrics are defined for measuring the human emotions at different places. To validate the feasibility of the framework, we take 80 tourist attractions around the world as an example and a happiness ranking list of places is generated based on human emotions calculated over 2 million faces detected out from over 6 million photos. Different kinds of geographical contexts are taken into consideration to find out the relationship between human emotions and environmental factors. Results show that much of the emotional variation at different places can be explained by a few factors such as openness. The research may offer insights on integrating human emotions to enrich the understanding of sense of place in geography and in place-based GIS.

AIDec 10, 2018
A context-based geoprocessing framework for optimizing meetup location of multiple moving objects along road networks

Shaohua Wang, Song Gao, Xin Feng et al.

Given different types of constraints on human life, people must make decisions that satisfy social activity needs. Minimizing costs (i.e., distance, time, or money) associated with travel plays an important role in perceived and realized social quality of life. Identifying optimal interaction locations on road networks when there are multiple moving objects (MMO) with space-time constraints remains a challenge. In this research, we formalize the problem of finding dynamic ideal interaction locations for MMO as a spatial optimization model and introduce a context-based geoprocessing heuristic framework to address this problem. As a proof of concept, a case study involving identification of a meetup location for multiple people under traffic conditions is used to validate the proposed geoprocessing framework. Five heuristic methods with regard to efficient shortest-path search space have been tested. We find that the R* tree-based algorithm performs the best with high quality solutions and low computation time. This framework is implemented in a GIS environment to facilitate integration with external geographic contextual information, e.g., temporary road barriers, points of interest (POI), and real-time traffic information, when dynamically searching for ideal meetup sites. The proposed method can be applied in trip planning, carpooling services, collaborative interaction, and logistics management.

CROct 6, 2013
Three-Way Dissection of a Game-CAPTCHA: Automated Attacks, Relay Attacks, and Usability

Manar Mohamed, Niharika Sachdeva, Michael Georgescu et al.

Existing captcha solutions on the Internet are a major source of user frustration. Game captchas are an interesting and, to date, little-studied approach claiming to make captcha solving a fun activity for the users. One broad form of such captchas -- called Dynamic Cognitive Game (DCG) captchas -- challenge the user to perform a game-like cognitive task interacting with a series of dynamic images. We pursue a comprehensive analysis of a representative category of DCG captchas. We formalize, design and implement such captchas, and dissect them across: (1) fully automated attacks, (2) human-solver relay attacks, and (3) usability. Our results suggest that the studied DCG captchas exhibit high usability and, unlike other known captchas, offer some resistance to relay attacks, but they are also vulnerable to our novel dictionary-based automated attack.