GIScience 2025 Accepted Papers

P1. Leveraging Open-Source Satellite-Derived Building Footprints for Height Inference

Authors

Clinton Stipek, Taylor Hauser, Justin Epting, Jessica Moehl, Daniel Adams

Abstract

At a global scale, cities are growing and characterizing the built environment is essential for deeper understanding of human population patterns, urban development, energy usage, climate change impacts, among others. Buildings are a key component of the built environment and significant progress has been made in recent years to scale building footprint extractions from satellite datum and other remotely sensed products. Billions of building footprints have recently been released by companies such as Microsoft and Google at a global scale. However, research has shown that depending on the methods leveraged to produce a footprint dataset, discrepancies can arise in both the number and shape of footprints produced. Therefore, each footprint dataset should be examined and used on a case-by-case study. In this work, we find through two experiments on Oak Ridge National Laboratory and Microsoft footprints within the same geographic extent that our approach of inferring height from footprint morphology features is source agnostic. Regardless of the differences associated with the methods used to produce a building footprint dataset, our approach of inferring height was able to overcome these discrepancies between the products and generalize, as evidenced by 98% of our results being within 3m of the ground-truthed height. This signifies that our approach can be applied to the billions of open-source footprints which are freely available to infer height, a key building metric. This work impacts the broader domain of urban science in which building height is a key, and limiting factor.

P2. Analysis of Points of Interests Recommended for Leisure Walk Descriptions

Authors

Ehsan Hamzei, Thi Minh Hoai Bui, Martin Tomko, Stephan Winter

Abstract

Leisure walking is a physical activity where locomotion through a natural or even urban environment is the goal in itself, e.g., in pursuit of health and wellbeing. In contrast to destination-oriented walks that are focused on navigation efficiency (i.e., shortest or simplest walk from source to destination), leisure walks emphasize experiencing the environment, engaging in activities, and discovering places that may be off route, or intermediate destinations en-route, summarily called points of interest (POIs). POIs are key for recommending leisure walks, yet a detailed analysis of POIs in the context of leisure walking is missing in the literature. This study extracts and annotates POIs of leisure walking recommendations available in WalkingMaps.com.au, creating an annotated dataset to address this research gap and provide a first analysis of leisure walking descriptions. We classify POIs using the verbal description provided in the dataset, match them with data available in OpenStreetMap (OSM), and compare the POIs with nearby alternatives in OSM. Our analysis reveals thematic and spatial patterns in POI selection, offering a machine learning approach to model POI choices for leisure walks. We further evaluate the availability of rich data in OSM for future automated leisure walking recommendation. This study contributes to automated systems for recommending leisure walks, tailoring suggestions based on available information in the spatial open data and presents an annotated dataset to facilitate future research in this field.

P3. Guiding Geospatial Analysis Processes in Dealing with Modifiable Areal Unit Problem

Authors

Guoray Cai, Yue Hao

Abstract

Geospatial analysis has been widely applied in different domains for critical decision making. However, the results of spatial analysis are often plagued with uncertainties due to measurement and representation in the data, the choice of area units, and unintended transformation effects. A well known example of such problems is the \textit{Modifiable Areal Unit Problem} (MAUP) which has well documented effects on the outcome of spatial analysis on area aggregated data. Addressing the effects of MAUP in practical spatial analysis is difficult, and existing methods and tools are limited and complex to use. Most analysts choose to ignore MAUP effects in practice due to lack of expertise, high cognitive loads, and resource constraints. In order to address the above challenges, this paper proposes a machine-guidance approach to augment the analyst's capacity in mitigating the effect of MAUP. Based on an analysis of practical challenges faced by human analysts, we identified multiple opportunities for the machine to guide the analysts by alerting to the rise of MAUP, assessing the impact of MAUP, choosing mitigation methods and generating visual guidance messages using GIS functions and tools. For each of the opportunities, we characterize the behavior patterns and the underlying guidance strategies that generate the behavior. We illustrate the behavior of machine guidance using a hotspot analysis sample scenario in the context of crime policing, where MAUP has strong effects on the patterns of hotspots. Finally, we describe the computational framework used to build a prototype guidance system and identify a number of research questions to be addressed. We conclude by discussing how the machine guidance approach could be an answer for addressing some of the toughest problems in geospatial analysis.

P4. MODAP: A Multi-City Open Data & Analytics Platform for Micromobility Research

Authors

Grant McKenzie

Abstract

Over the past decade, micromobility services, particularly electric vehicles for personal short-distance trips, have experienced significant growth. Major cities around the world now host extensive fleets of vehicles available for short-term public rental. While previous research has examined usage patterns within and between a few select cities, large, open, and publicly accessible data sets for analyzing mobility across multiple cities are extremely limited. We have collected, curated, and aggregated over twenty million e-scooter and e-bicycle trips across five major cities and are openly releasing these data for use by mobility and sustainable transport researchers, urban planners, and policymakers. To accompany these data, we developed MODAP (Micromobility Open Data & Analytics Platform), a geovisual analytics tool that empowers researchers to explore the temporal and regional patterns of e-mobility trips within our open data set and download the data for offline analysis. Our objective is to foster further research into city-scale mobility patterns and to equip researchers, community members, and policymakers with the necessary tools to conduct this work.

P5. Geovicla: Automated Classification of Interactive Web-based Geovisualizations

Authors

Phil Hüffer, Auriol Degbelo, Benjamin Risse

Abstract

The exponential growth of interactive geovisualizations on the Web has underscored the need for automated techniques to enhance their findability. In this paper, we present the Geovicla dataset (2.5K instances), constructed through the harvesting and manual labelling of webpages from a broad range of domains. The webpages are categorized into three groups: 'interactive visualisation', 'interactive geovisualisation' and 'no interactive visualisation'. Using this dataset, we compared three approaches for interactive (geo)visualization classification: (i) a heuristic-based approach (i.e. using manually derived rules), (ii) a feature-engineering approach (i.e. hand crafted feature vectors combined with machine learning classifiers) and (iii) an embedding-based approach (i.e. automatically generated large language model (LLM) embeddings with machine learning classifiers). The results indicate that LLM embeddings, when used in conjunction with a multilayer perceptron, form a promising combination, achieving up to 74% accuracy for multiclass classification and 75% for binary classification. The dataset and the insights gained from our empirical comparison offer valuable resources for GIScience researchers aiming to enhance the discoverability of interactive geovisualizations.

P6. CityJSON Management using Multi-Model Graph Database to Support 3D Urban Data Management

Authors

Muhammad Syafiq, Suhaibah Azri, Uznir Ujang

Abstract

The prevalence of 3D city models in urban applications is increasing due to their lightweight and flexibility, therefore, adaptable to various applications. However, effective data interoperability remains an issue. Managing 3D city models within a database can improve urban data management applications, such as data enrichment and efficient querying. Motivated by the need for better interoperability of 3D city models, this paper proposes a novel method for storing CityJSON using the concept of a multi-model graph database, as a foundation for enriching their semantics. The proposed approach involves decomposing CityJSON objects into smaller JSON components, which are then abstracted into graph elements. Parent-child and other relationship attributes are modelled to capture the hierarchical and associative structures of the CityJSON data. A specific programme is employed to preprocess CityJSON data based on several conditions before being loaded into the graph database. Our multi-model approach allows three types of queries, document, graph and hybrid query. The latter combines both document and graph query. Comparative evaluation against relational databases demonstrates that our proposed method outperforms in terms of query performance. The improved query performance is attributed to the advantage of graph database that reduced the need for joins and the ability to efficiently index and navigate JSON data. The findings of this study establish a foundation for semantic enrichment of 3D city models to improve interoperability and support advanced urban data management.

P7. Enriching Location Representation with Detailed Semantic Information

Authors

Junyuan Liu, Xinglei Wang, Tao Cheng

Abstract

Understanding urban environments requires spatial representations that capture both geometric structures and comprehensive geographic information. Traditional spatial embeddings often prioritize spatial proximity while underutilizing contextual information from places. To address this limitation, we introduce CaLLiPer+, an extension of the CaLLiPer model, which systematically integrates Point-of-Interest (POI) names alongside categorical labels within a multimodal contrastive learning framework. We evaluate its effectiveness on two downstream tasks—land use classification and socioeconomic status distribution mapping—demonstrating consistent performance gains of 4% to 11% over baseline methods. Additionally, we show that incorporating POI names enhances location retrieval, enabling models to capture complex urban concepts with greater precision. Ablation studies further reveal the complementary role of POI names and the advantages of leveraging pretrained text encoders for spatial representations. Our findings suggest a promising direction for integrating fine-grained semantic attributes and multimodal learning techniques into urban foundation models.

P8. Accommodating space-time scaling issues in GAM-based varying coefficient models

Authors

Alexis Comber, Paul Harris, Chris Brunsdon

Abstract

The paper describes modifications to spatial and temporal varying coefficient (STVC) modelling, using Generalized Additive Models (GAMs). Previous work developed tools using Gaussian Process (GP) splines parameterised with location and time variables, and has presented a space-time toolkit in the \texttt{stgam} R package, providing wrapper functions to the \texttt{mgcv} R package. However, whilst GP smooths are acceptable for working they are not for working with space \textit{and} time. A more robust approach is to use a tensor product smooth with GP basis. However, these in turn require correlation function length scale or range parameters ((\rho)) to be defined. These are distances (in space or time) at which the correlation function falls below some value, and can be used to indicate the scale of spatial and temporal dependencies between response and predictor variables (similar to geographically weighted bandwidths). The paper describes the problem in detail, illustrates an approach for optimising (\rho) and methods for determining model specification.

P9. A modularity-driven framework for unraveling congestion centers with enhanced spatial-semantic features

Authors

Weihua Huan, Xintao Liu, Wei Huang

Abstract

The propagation of traffic congestion is a complicated spatiotemporal phenomenon in urban networks. Extensive studies mainly relied on dynamic Bayesian network or deep learning approaches. However, they often struggle to adapt seamlessly to diverse data granularities, limiting their applicability. In this study, we propose a modularity-driven method to unravel the spatiotemporal congestion propagation centers, effectively addressing temporal granularity challenges through the use of the fast Fourier Transform (FFT). Our framework stands out due to its scalability to integrate enhanced spatial-semantic features while eliminating temporal granularity dependence, which consists of two data-driven modules. One is adaptive adjacency matrix learning module, which captures the spatiotemporal relationship from evolving congestion graphs by fusing node degree, spatial proximity, and the FFT of traffic state indices. The other one is local search module, which employs local dominance principles to unravel the congestion propagation centers. We validate our proposed methodology on large-scale traffic networks in New York City, the United States. An ablation study on the dataset reveals that the combination of the three features achieves the highest modularity scores of 0.65. The contribution of our work is to provide a novel way to infer the propagation centers of traffic congestion, and reveals the flexibility of extending our framework at multiple scales. The network resilience and dynamic evolution of the identified congestion centers can provide implications for actional decisions.

P10. Georeferencing Historical Maps at Scale

Authors

Rere-No-A-Rangi Pope, Marcus Frean

Abstract

This paper presents a novel approach to automatically georeferencing historical maps using a line intersections-based matching algorithm, which we term Koki Tauriterite. Our algorithm addresses the challenges inherent in linking historical map images to contemporary cadastral data, particularly the issues of temporal discrepancies, cartographic distortions, and map image noise. By extracting and comparing angular relationships between cadastral features, termed as monads and dyads, we establish a robust method for performing record linkage by identifying corresponding spatial patterns across disparate datasets. We employ a Bayesian framework to quantify the likelihood of dyad matches and a likelihood function that accounts for measurement noise. Our approach includes a multi-step filtering process to reduce the computational burden of large scale data matching, followed by a maximum likelihood estimation to identify the most probable matches. The algorithm's performance was initially evaluated on a small dataset before being scaled up to 100,000 randomly selected regions across Aotearoa New Zealand. While achieving scores in the upper 10% of all tested regions, our method demonstrated both strengths and areas for potential improvement when applied to the cadaster dataset. We discuss the implications of these findings and propose strategies for further enhancing the algorithm's robustness and efficiency. Our work is motivated by previous work in the areas of critical GIS, critical cartography and spatial justice and seeks to contribute to the areas of Spatial Data Science, Historical GIS and GIScience.

P11. Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing

Authors

Kalana Wijegunarathna, Kristin Stock, Christopher B. Jones

Abstract

Millions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multi-modal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach (~1 km Average distance error) compared to uni-modal georeferencing with Large Language Models and existing georeferencing tools. The paper also discusses the findings of the experiments in light of an LMM's ability to comprehend fine-grained maps. Motivated by these results, a practical framework is proposed to integrate this method into a georeferencing workflow.

P12. BERT4Traj: Transformer-Based Trajectory Reconstruction for Sparse Mobility Data

Authors

Hao Yang, X. Angela Yao, Christopher C. Whalen, Gengchen Mai

Abstract

Understanding human mobility is essential for applications in public health, transportation, and urban planning. However, mobility data often suffers from sparsity due to limitations in data collection methods, such as infrequent GPS sampling or call detail record (CDR) data that only capture locations during communication events. To address this challenge, we propose BERT4Traj, a transformer-based model that reconstructs complete mobility trajectories by predicting hidden visits in sparse movement sequences. Inspired by BERT’s masking and self-attention mechanisms, BERT4Traj leverages spatial embeddings, temporal embeddings and contextual background features such as demographics and anchor points. We evaluate BERT4Traj on real-world CDR and GPS datasets collected in Kampala, Uganda, demonstrating that our approach significantly outperforms traditional models such as Markov Chains, KNN, RNNs, and LSTMs. Our results show that BERT4Traj effectively reconstructs detailed and continuous mobility trajectories, enhancing insights into human movement patterns.

P13. Precomputed Topological Relations for Integrated Geospatial Analysis across Knowledge Graphs

Authors

Katrina Schweikert, David K. Kedrowski, Shirly Stephen, Torsten Hahmann

Abstract

Geospatial Knowledge Graphs (GeoKGs) represent a significant advancement in the integration of AI-driven geographic information, facilitating interoperable and semantically rich geospatial analytics across various domains. This paper explores the use of topologically enriched GeoKGs, built on an explicit representation of S2 Geometry alongside precomputed topological relations, for constructing efficient geospatial analysis workflows within and across graphs. Using the SAWGraph GeoKG as a case study, we demonstrate how this framework supports fundamental GIS operations---such as spatial filtering, proximity analysis, overlay operations and network analysis---in a GeoKG setting while allowing for the easy linking of these operations in combination with semantic filters. This enables the efficient execution of complex geospatial analyses as semantically-explicit queries and enhances the usability of geospatial data across graphs. Additionally, the framework eliminates the need for explicit support for GeoSPARQL's topological operations in the utilized graph databases and better integrates spatial knowledge into the overall semantic inference process supported by RDFS and OWL ontologies.

P14. U-Prithvi: Integrating a Foundation Model and U-Net for Enhanced Flood Inundation Mapping

Authors

Vit Kostejn, Yamil Essus, Jenna Abrahamson, Ranga Vatsavai

Abstract

In recent years, large pre-trained models, commonly referred to as foundation models, have become increasingly popular for various tasks leveraging transfer learning. This trend has expanded to remote sensing, where transformer-based foundation models such as Prithvi, msGFM, and SatSwinMAE have been utilized for a range of applications. While these transformer-based models, particularly the Prithvi model, exhibit strong generalization capabilities, they have limitations on capturing fine-grained details compared to convolutional neural network architectures like U-Net in segmentation tasks. In this paper, we propose a novel architecture, U-Prithvi, which combines the strengths of the Prithvi transformer with those of U-Net. We introduce a RandomHalfMaskLayer to ensure balanced learning from both models during training. Our approach is evaluated on the Sen1Floods11 dataset for flood inundation mapping, and experimental results demonstrate better performance of U-Prithvi over both individual models, achieving improved performance on out-of-sample data. While this principle is illustrated using the Prithvi model, it is easily adaptable to other foundation models.

P15. Assessing Map Reproducibility with Visual Question-Answering: An Empirical Evaluation

Authors

Eftychia Koukouraki, Auriol Degbelo, Christian Kray

Abstract

Reproducibility is a key principle of the modern scientific method. Maps, as an important means of communicating scientific results in GIScience and across disciplines, should be reproducible. Currently, map reproducibility assessment is done manually, which makes the assessment process tedious and time-consuming, ultimately limiting its scalability. Hence, this work explores the extent to which Visual Question-Answering (VQA) can be used to automate some tasks relevant to map reproducibility assessment. We selected five state-of-the-art vision language models (VLMs) and followed a three-step approach to evaluate their ability to discriminate between maps and other images, interpret map content, and compare two map images using VQA. Our results show that current VLMs already possess map-reading capabilities and demonstrate understanding of spatial concepts, such as cardinal directions, geographic scope, and legend interpretation. Our paper demonstrates the potential of using VQA to support reproducibility assessment and highlights the outstanding issues that need to be addressed to achieve accurate, trustworthy map descriptions, thereby reducing the time and effort required by human evaluators.

P16. What, When, and Where Do You Mean? Detecting Spatio-Temporal Concept Drift in Scientific Texts

Authors

Meilin Shi, Krzysztof Janowicz, Zilong Liu, Mina Karimi, Ivan Majic, Alexandra Fortacz

Abstract

Inundated by the rapidly expanding AI research nowadays, the research community requires more effective research data management than ever. A key challenge lies in the evolving nature of concepts embedded in the growing body of research publications. As concepts evolve over time (e.g., keywords like global warming become more commonly referred to as climate change), past research may become harder to find and interpret in a modern context. This phenomenon, known as concept drift, affects how research topics and keywords are understood, categorized, and retrieved. Beyond temporal drift, such variations also occur across geographic space, reflecting differences in local policies, research priorities, and more. In this work, we introduce the notion of spatio-temporal concept drift to capture how concepts in scientific texts evolve across both space and time. Using a scientometric dataset in geographic information science, we detect how research keywords drifted across countries and years using word embeddings. By detecting spatio-temporal concept drift, we can better align archival research and bridge regional differences, ensuring scientific knowledge remains findable and interoperable within evolving research landscapes.

P17. Search space reduction using species distribution modeling with simulated pollen signatures

Authors

Haoyu Wang, Jennifer Miller, Shalene Jha

Abstract

Microscopic trace materials, such as pollen, are an important category of forensic evidence recovered during investigations. As an environmentally ubiquitous substance that can attach to various surfaces, pollen enables the linking of objects and people in space and time. In this study, we assessed the extent to which the search space could be reduced using simulated pollen signatures. These signatures were compiled by randomly selecting pairs of geographic coordinates on the Earth’s terrestrial land and querying the Global Biodiversity Information Facility (GBIF) database to identify plant taxa within 50 meters of the coordinates. These taxa were then treated as the parent taxa of the pollen, simulating the hypothetical attachment of pollen signatures to objects or individuals. For each identified pollen taxon, we modeled habitat suitability for the parent plant taxa and combined the spatial distributions to refine the geolocation search area. Since the actual coordinates for these locations of interest were known, we were able to evaluate the global performance of the search space reduction under the assumption of an extreme constraint that no other contextual information was available.

P18. The inherent structure of experiments as a constraint to spatial analysis and modeling

Authors

Simon Scheider, Judith Verstegen

Abstract

We argue that in order to justify a modeling approach for a particular purpose, we need to better understand the experimental structure that is supposed to be represented by a given model application. For this purpose, we introduce a logic for specifying causal as well as spatio-temporal experiments, by reinterpreting Sinton's idea of the structure of spatial information from an experimental viewpoint. We illustrate the use of this logic by showing to what extent remote sensing and simulation approaches are justifiable for representing the experiments involved in a landuse modeling example.

P19. Identifying Resilient Communities in Road Networks: A Path-Based Embedding Approach

Authors

Christopher Wagner, Somayeh Dodge, Danial Alizadeh

Abstract

Effective resilience analysis of road networks is fundamental to building sustainable and disaster prepared cities. Identifying which road segments share similar vulnerabilities is important for designing robust infrastructure that can withstand disruptions as effectively as possible before they occur. Graph-based community detection can be applied to group areas of the network sharing similar structural vulnerabilities. However, current graph-based community detection methods either struggle with integrating node features during partitioning or fail to consider the path-based dependencies in road networks. This paper introduces the Path-based Community Embedding (PCE) model, an approach that leverages path-based embeddings to overcome these limitations. PCE combines the strengths of graph attention networks and LSTMs to learn representations that incorporate both local neighborhood information and long-range path dependencies. Our results on the Santa Barbara road network show that PCE improves community detection performance for resilience analysis, thus offering a powerful tool for transportation engineers to preemptively identify vulnerabilities in road networks.