Data Architectures for AI-Ready Interoperable Public Transportation Ecosystems
For public transportation agencies and researchers, this work provides a conceptual framework to design interoperable data pipelines, though it remains a high-level proposal without empirical validation.
The paper identifies fragmentation and lack of interoperability in public transportation data as key barriers to AI-driven analytics, and proposes data architecture patterns adapted from enterprise computing to address these challenges in the PT domain.
Public transportation (PT) agencies generate vast amounts of heterogeneous data from automatic fare collection (AFC), automatic passenger counting (APC), vehicle location (AVL/CAD), schedule and real-time feeds (GTFS/GTFS-RT), and proprietary platforms. These datasets offer unprecedented opportunities for data-driven planning, operations, and passenger services, but their potential is constrained by fragmentation, inconsistent update frequencies, and the lack of reproducible, interoperable pipelines. While contemporary data platform patterns and architectural styles from enterprise computing address analogous challenges in other sectors, their adaptation to the PT domain remains mostly underexplored. Transit systems present unique conditions, including the convergence of Information Technology (IT) and Operational Technology (OT), long asset lifecycles, rigorous security requirements, multi-agency coordination requirements, and the need to operate on live systems that preclude controlled experimentation.