SIJan 29, 2024Code
A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and CondensationMohammad Hashemi, Shengbo Gong, Juntong Ni et al.
Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation. Specifically, we establish a unified definition for these methods and introduce a hierarchical taxonomy to categorize the challenges they address. Our survey then systematically reviews the technical details of these methods and emphasizes their practical applications across diverse scenarios. Furthermore, we outline critical research directions to ensure the continued effectiveness of graph reduction techniques, as well as provide a comprehensive paper list at \url{https://github.com/Emory-Melody/awesome-graph-reduction}. We hope this survey will bridge literature gaps and propel the advancement of this promising field.
LGJun 25, 2025Code
PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest DataMohammad Hashemi, Hossein Amiri, Andreas Zufle
With the rapid growth and continual updates of geospatial data from diverse sources, geospatial foundation model pre-training for urban representation learning has emerged as a key research direction for advancing data-driven urban planning. Spatial structure is fundamental to effective geospatial intelligence systems; however, existing foundation models often lack the flexibility to reason about places, context-rich regions spanning multiple spatial granularities that may consist of many spatially and semantically related points of interest. To address this gap, we propose PlaceFM, a geospatial foundation model that captures place representations through a training-free, clustering-based approach. PlaceFM summarizes the entire point of interest graph constructed from U.S. Foursquare data, producing general-purpose region embeddings while automatically identifying places of interest. These embeddings can be directly integrated into geolocation data pipelines to support a variety of urban downstream tasks. Without the need for costly pre-training, PlaceFM provides a scalable and efficient solution for multi-granular geospatial analysis. Extensive experiments on two real-world prediction tasks, ZIP code-level population density and housing prices, demonstrate that PlaceFM not only outperforms most state-of-the-art graph-based geospatial foundation models but also achieves up to a 100x speedup in generating region-level representations on large-scale POI graphs. The implementation is available at https://github.com/mohammadhashemii/PlaceFM.
LGFeb 24, 2025
Scalable Graph Condensation with Evolving CapabilitiesShengbo Gong, Mohammad Hashemi, Juntong Ni et al.
The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.
AIJun 17, 2025
From Points to Places: Towards Human Mobility-Driven Spatiotemporal Foundation Models via Understanding PlacesMohammad Hashemi, Andreas Zufle
Capturing human mobility is essential for modeling how people interact with and move through physical spaces, reflecting social behavior, access to resources, and dynamic spatial patterns. To support scalable and transferable analysis across diverse geographies and contexts, there is a need for a generalizable foundation model for spatiotemporal data. While foundation models have transformed language and vision, they remain limited in handling the unique challenges posed by the spatial, temporal, and semantic complexity of mobility data. This vision paper advocates for a new class of spatial foundation models that integrate geolocation semantics with human mobility across multiple scales. Central to our vision is a shift from modeling discrete points of interest to understanding places: dynamic, context-rich regions shaped by human behavior and mobility that may comprise many places of interest. We identify key gaps in adaptability, scalability, and multi-granular reasoning, and propose research directions focused on modeling places and enabling efficient learning. Our goal is to guide the development of scalable, context-aware models for next-generation geospatial intelligence. These models unlock powerful applications ranging from personalized place discovery and logistics optimization to urban planning, ultimately enabling smarter and more responsive spatial decision-making.
DBOct 24, 2025
World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph DataHossein Amiri, Mohammad Hashemi, Andreas Züfle
Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack complete metadata such as addresses or categories, and some correspond to non-existent or fictional locations. In contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with detailed and frequently updated metadata, though it does not formally verify whether a POI represents an actual business. In this data paper, we present a methodology that integrates the strengths of both datasets: Foursquare as a comprehensive baseline of commercial POIs and OSM as a source of enriched metadata. The combined dataset totals approximately 1 TB. While this full version is not publicly released, we provide filtered releases with adjustable thresholds that reduce storage needs and make the data practical to download and use across domains. We also provide step-by-step instructions to reproduce the full 631 GB build. Record linkage is achieved by computing name similarity scores and spatial distances between Foursquare and OSM POIs. These measures identify and retain high-confidence matches that correspond to real businesses in Foursquare, have representations in OSM, and show strong name similarity. Finally, we use this filtered dataset to construct a graph-based representation of POIs enriched with attributes from both sources, enabling advanced spatial analyses and a range of downstream applications.
LGSep 2, 2021
Parkinson's Disease Diagnosis based on Gait Cycle Analysis Through an Interpretable Interval Type-2 Neuro-Fuzzy SystemArmin Salimi-Badr, Mohammad Hashemi, Hamidreza Saffari
In this paper, an interpretable classifier using an interval type-2 fuzzy neural network for detecting patients suffering from Parkinson's Disease (PD) based on analyzing the gait cycle is presented. The proposed method utilizes clinical features extracted from the vertical Ground Reaction Force (vGRF), measured by 16 wearable sensors placed in the soles of subjects' shoes and learns interpretable fuzzy rules. Therefore, experts can verify the decision made by the proposed method based on investigating the firing strength of interpretable fuzzy rules. Moreover, experts can utilize the extracted fuzzy rules for patient diagnosing or adjust them based on their knowledge. To improve the robustness of the proposed method against uncertainty and noisy sensor measurements, Interval Type-2 Fuzzy Logic is applied. To learn fuzzy rules, two paradigms are proposed: 1- A batch learning approach based on clustering available samples is applied to extract initial fuzzy rules, 2- A complementary online learning is proposed to improve the rule base encountering new labeled samples. The performance of the method is evaluated for classifying patients and healthy subjects in different conditions including the presence of noise or observing new instances. Moreover, the performance of the model is compared to some previous supervised and unsupervised machine learning approaches. The final Accuracy, Precision, Recall, and F1 Score of the proposed method are 88.74%, 89.41%, 95.10%, and 92.16%. Finally, the extracted fuzzy sets for each feature are reported.
LGOct 23, 2018
Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation DefensesMohammad Hashemi, Greg Cusack, Eric Keller
It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples.