h-index6
50papers
779citations
Novelty35%
AI Score50

50 Papers

LGApr 20, 2022
Ordinal-ResLogit: Interpretable Deep Residual Neural Networks for Ordered Choices

Kimia Kamal, Bilal Farooq

This study presents an Ordinal version of Residual Logit (Ordinal-ResLogit) model to investigate the ordinal responses. We integrate the standard ResLogit model into COnsistent RAnk Logits (CORAL) framework, classified as a binary classification algorithm, to develop a fully interpretable deep learning-based ordinal regression model. As the formulation of the Ordinal-ResLogit model enjoys the Residual Neural Networks concept, our proposed model addresses the main constraint of machine learning algorithms, known as black-box. Moreover, the Ordinal-ResLogit model, as a binary classification framework for ordinal data, guarantees consistency among binary classifiers. We showed that the resulting formulation is able to capture underlying unobserved heterogeneity from the data as well as being an interpretable deep learning-based model. Formulations for market share, substitution patterns, and elasticities are derived. We compare the performance of the Ordinal-ResLogit model with an Ordered Logit Model using a stated preference (SP) dataset on pedestrian wait time and a revealed preference (RP) dataset on travel distance. Our results show that Ordinal-ResLogit outperforms the traditional ordinal regression model for both datasets. Furthermore, the results obtained from the Ordinal-ResLogit RP model show that travel attributes such as driving and transit cost have significant effects on choosing the location of non-mandatory trips. In terms of the Ordinal-ResLogit SP model, our results highlight that the road-related variables and traffic condition are contributing factors in the prediction of pedestrian waiting time such that the mixed traffic condition significantly increases the probability of choosing longer waiting times.

LGDec 21, 2022
Debiased machine learning for estimating the causal effect of urban traffic on pedestrian crossing behaviour

Kimia Kamal, Bilal Farooq

Before the transition of AVs to urban roads and subsequently unprecedented changes in traffic conditions, evaluation of transportation policies and futuristic road design related to pedestrian crossing behavior is of vital importance. Recent studies analyzed the non-causal impact of various variables on pedestrian waiting time in the presence of AVs. However, we mainly investigate the causal effect of traffic density on pedestrian waiting time. We develop a Double/Debiased Machine Learning (DML) model in which the impact of confounders variable influencing both a policy and an outcome of interest is addressed, resulting in unbiased policy evaluation. Furthermore, we try to analyze the effect of traffic density by developing a copula-based joint model of two main components of pedestrian crossing behavior, pedestrian stress level and waiting time. The copula approach has been widely used in the literature, for addressing self-selection problems, which can be classified as a causality analysis in travel behavior modeling. The results obtained from copula approach and DML are compared based on the effect of traffic density. In DML model structure, the standard error term of density parameter is lower than copula approach and the confidence interval is considerably more reliable. In addition, despite the similar sign of effect, the copula approach estimates the effect of traffic density lower than DML, due to the spurious effect of confounders. In short, the DML model structure can flexibly adjust the impact of confounders by using machine learning algorithms and is more reliable for planning future policies.

LGMay 11, 2022
eFedDNN: Ensemble based Federated Deep Neural Networks for Trajectory Mode Inference

Daniel Opoku Mensah, Godwin Badu-Marfo, Ranwa Al Mallah et al.

As the most significant data source in smart mobility systems, GPS trajectories can help identify user travel mode. However, these GPS datasets may contain users' private information (e.g., home location), preventing many users from sharing their private information with a third party. Hence, identifying travel modes while protecting users' privacy is a significant issue. To address this challenge, we use federated learning (FL), a privacy-preserving machine learning technique that aims at collaboratively training a robust global model by accessing users' locally trained models but not their raw data. Specifically, we designed a novel ensemble-based Federated Deep Neural Network (eFedDNN). The ensemble method combines the outputs of the different models learned via FL by the users and shows an accuracy that surpasses comparable models reported in the literature. Extensive experimental studies on a real-world open-access dataset from Montreal demonstrate that the proposed inference model can achieve accurate identification of users' mode of travel without compromising privacy.

LGJan 6, 2023
Attention-LSTM for Multivariate Traffic State Prediction on Rural Roads

Elahe Sherafat, Bilal Farooq, Amir Hossein Karbasi et al.

Accurate traffic volume and speed prediction have a wide range of applications in transportation. It can result in useful and timely information for both travellers and transportation decision-makers. In this study, an Attention based Long Sort-Term Memory model (A-LSTM) is proposed to simultaneously predict traffic volume and speed in a critical rural road segmentation which connects Tehran to Chalus, the most tourist destination city in Iran. Moreover, this study compares the results of the A-LSTM model with the Long Short-Term Memory (LSTM) model. Both models show acceptable performance in predicting speed and flow. However, the A-LSTM model outperforms the LSTM in 5 and 15-minute intervals. In contrast, there is no meaningful difference between the two models for the 30-minute time interval. By comparing the performance of the models based on different time horizons, the 15-minute horizon model outperforms the others by reaching the lowest Mean Square Error (MSE) loss of 0.0032, followed by the 30 and 5-minutes horizons with 0.004 and 0.0051, respectively. In addition, this study compares the results of the models based on two transformations of temporal categorical input variables, one-hot or cyclic, for the 15-minute time interval. The results demonstrate that both LSTM and A-LSTM with cyclic feature encoding outperform those with one-hot feature encoding.

LGNov 23, 2022
Robustness Analysis of Deep Learning Models for Population Synthesis

Daniel Opoku Mensah, Godwin Badu-Marfo, Bilal Farooq

Deep generative models have become useful for synthetic data generation, particularly population synthesis. The models implicitly learn the probability distribution of a dataset and can draw samples from a distribution. Several models have been proposed, but their performance is only tested on a single cross-sectional sample. The implementation of population synthesis on single datasets is seen as a drawback that needs further studies to explore the robustness of the models on multiple datasets. While comparing with the real data can increase trust and interpretability of the models, techniques to evaluate deep generative models' robustness for population synthesis remain underexplored. In this study, we present bootstrap confidence interval for the deep generative models, an approach that computes efficient confidence intervals for mean errors predictions to evaluate the robustness of the models to multiple datasets. Specifically, we adopt the tabular-based Composite Travel Generative Adversarial Network (CTGAN) and Variational Autoencoder (VAE), to estimate the distribution of the population, by generating agents that have tabular data using several samples over time from the same study area. The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018 and compare the predictive performance under varying sample sizes from multiple surveys. Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets of the varying sample sizes when compared to VAE. Again, the evaluation of model robustness against varying sample size shows a minimal decrease in model performance with decrease in sample size. This study directly supports agent-based modelling by enabling finer synthetic generation of populations in a reliable environment.

SOC-PHMar 17
Assessment of Latent Pedestrian--Vehicle Interaction Risk Profiles at Midblock Crossing in VR

Rulla Al-Haideri, Bilal Farooq, Elisabetta Cherchi

Pedestrian safety at midblock crossings is a critical concern in mixed traffic environments where autonomous vehicles (AVs) and human-driven vehicles (HDVs) share the road. Pedestrians often infer intent from vehicle motion in AV encounters, making them vulnerable to small shifts in conflict margins. This study investigates whether virtual reality (VR) crossing sessions separate into distinct interaction risk profiles and whether AV-only sessions shift profile prevalence compared to HDV-only sessions. Using large-scale immersive VR experiments from Toronto, Canada, and Newcastle, England, we compute surrogate safety measures (SSMs) and apply latent profile analysis (LPA) to identify distinct pedestrian crossing stances, ranging from risk-accepting to highly cautious. Key findings show that Newcastle exhibits a higher prevalence of high-urgency risk profiles in AV-only sessions, indicating that AVs contribute to higher-risk encounters. In contrast, Toronto shows no significant difference between AV-only and HDV-only sessions, suggesting that contextual factors influence the impact of AVs on pedestrian safety.

SOC-PHMar 1
From GEV to ResLogit: Spatially Correlated Discrete Choice Models for Pedestrian Movement Prediction

Rulla Al-Haideri, Bilal Farooq

High frequency pedestrian motion forecasting when interacting with autonomous vehicles (AVs) can be enhanced through the use of behavioural frameworks, such as discrete choice models, that can explicitly account for correlation among similar movement alternatives. We formulate the pedestrian next step choice as a spatial discrete choice defined by a grid of speed adjustment and heading change. Using naturalistic pedestrian-AV encounters from nuScenes and Argoverse 2 (1 sec decision interval), we estimate a multinomial logit baseline and four spatial generalized extreme value (GEV) specifications (SCL, GSCL, SCNL, and GSCNL). We then compare them to a residual neural network logit (ResLogit) model that learns cross alternative effects while retaining an interpretable linear utility component. Across the evaluated data, spatial GEV structures yield only marginal improvements over multinomial logit, whereas ResLogit achieves a substantially better fit and produces behaviourally coherent errors concentrated among neighbouring grid cells. The results suggest that in dense, high frequency spatial choice sets, learning based residual corrections can capture proximity induced correlation more effectively than analyst specified GEV nesting structures, while maintaining interpretability.

AIFeb 17
Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models

Farbod Abbasi, Zachary Patterson, Bilal Farooq

Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.

CLJul 29, 2025
Towards Locally Deployable Fine-Tuned Causal Large Language Models for Mode Choice Behaviour

Tareq Alsaleh, Bilal Farooq

This study investigates the adoption of open-access, locally deployable causal large language models (LLMs) for travel mode choice prediction and introduces LiTransMC, the first fine-tuned causal LLM developed for this task. We systematically benchmark eleven open-access LLMs (1-12B parameters) across three stated and revealed preference datasets, testing 396 configurations and generating over 79,000 mode choice decisions. Beyond predictive accuracy, we evaluate models generated reasoning using BERTopic for topic modelling and a novel Explanation Strength Index, providing the first structured analysis of how LLMs articulate decision factors in alignment with behavioural theory. LiTransMC, fine-tuned using parameter efficient and loss masking strategy, achieved a weighted F1 score of 0.6845 and a Jensen-Shannon Divergence of 0.000245, surpassing both untuned local models and larger proprietary systems, including GPT-4o with advanced persona inference and embedding-based loading, while also outperforming classical mode choice methods such as discrete choice models and machine learning classifiers for the same dataset. This dual improvement, i.e., high instant-level accuracy and near-perfect distributional calibration, demonstrates the feasibility of creating specialist, locally deployable LLMs that integrate prediction and interpretability. Through combining structured behavioural prediction with natural language reasoning, this work unlocks the potential for conversational, multi-task transport models capable of supporting agent-based simulations, policy testing, and behavioural insight generation. These findings establish a pathway for transforming general purpose LLMs into specialized and explainable tools for transportation research and policy formulation, while maintaining privacy, reducing cost, and broadening access through local deployment.

LGMay 2, 2024
A deep causal inference model for fully-interpretable travel behaviour analysis

Kimia Kamal, Bilal Farooq

Transport policy assessment often involves causal questions, yet the causal inference capabilities of traditional travel behavioural models are at best limited. We present the deep CAusal infeRence mOdel for traveL behavIour aNAlysis (CAROLINA), a framework that explicitly models causality in travel behaviour, enhances predictive accuracy, and maintains interpretability by leveraging causal inference, deep learning, and traditional discrete choice modelling. Within this framework, we introduce a Generative Counterfactual model for forecasting human behaviour by adapting the Normalizing Flow method. Through the case studies of virtual reality-based pedestrian crossing behaviour, revealed preference travel behaviour from London, and synthetic data, we demonstrate the effectiveness of our proposed models in uncovering causal relationships, prediction accuracy, and assessing policy interventions. Our results show that intervention mechanisms that can reduce pedestrian stress levels lead to a 38.5% increase in individuals experiencing shorter waiting times. Reducing the travel distances in London results in a 47% increase in sustainable travel modes.

LGMar 11
Copula-ResLogit: A Deep-Copula Framework for Unobserved Confounding Effects

Kimia Kamal, Bilal Farooq

A key challenge in travel demand analysis is the presence of unobserved factors that may generate non-causal dependencies, obscuring the true causal effects. To address the issue, the study introduces a novel deep learning based fully interpretable joint modelling framework, Copula-ResLogit, which integrates the flexibility of Residual Neural Network (ResNet) architectures with the dependence capturing capabilities of copula models. This hybrid structure enables us to first detect unobserved confounding through traditional copula function based joint modelling and then mitigate these hidden associations by incorporating deep learning components. The study applies this framework to two case studies, including the relationship between stress levels and wait time of pedestrians when crossing mid block in VR and the dependencies between travel mode choice and travel distance in London travel behaviour data. Results show that Copula-ResLogit substantially reduces or eliminates the dependencies, demonstrating the ability of residual layers to account for hidden confounding effects.

QUANT-PHAug 7, 2025
Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery

Farzan Moosavi, Bilal Farooq

Quantum computation has demonstrated a promising alternative to solving the NP-hard combinatorial problems. Specifically, when it comes to optimization, classical approaches become intractable to account for large-scale solutions. Specifically, we investigate quantum computing to solve the large-scale Capacitated Pickup and Delivery Problem with Time Windows (CPDPTW). In this regard, a Reinforcement Learning (RL) framework augmented with a Parametrized Quantum Circuit (PQC) is designed to minimize the travel time in a realistic last-mile on-demand delivery. A novel problem-specific encoding quantum circuit with an entangling and variational layer is proposed. Moreover, Proximal Policy Optimization (PPO) and Quantum Singular Value Transformation (QSVT) are designed for comparison through numerical experiments, highlighting the superiority of the proposed method in terms of the scale of the solution and training complexity while incorporating the real-world constraints.

ROAug 5, 2025
Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions

Ergi Tushe, Bilal Farooq

The integration of Automated Delivery Robots (ADRs) into pedestrian-heavy urban spaces introduces unique challenges in terms of safe, efficient, and socially acceptable navigation. We develop the complete pipeline for a single vision sensor based multi-pedestrian detection and tracking, pose estimation, and monocular depth perception. Leveraging the real-world MOT17 dataset sequences, this study demonstrates how integrating human-pose estimation and depth cues enhances pedestrian trajectory prediction and identity maintenance, even under occlusions and dense crowds. Results show measurable improvements, including up to a 10% increase in identity preservation (IDF1), a 7% improvement in multiobject tracking accuracy (MOTA), and consistently high detection precision exceeding 85%, even in challenging scenarios. Notably, the system identifies vulnerable pedestrian groups supporting more socially aware and inclusive robot behaviour.

LGJul 7, 2025
Dynamic Campus Origin-Destination Mobility Prediction using Graph Convolutional Neural Network on WiFi Logs

Godwin Badu-Marfo, Bilal Farooq

We present an integrated graph-based neural networks architecture for predicting campus buildings occupancy and inter-buildings movement at dynamic temporal resolution that learns traffic flow patterns from Wi-Fi logs combined with the usage schedules within the buildings. The relative traffic flows are directly estimated from the WiFi data without assuming the occupant behaviour or preferences while maintaining individual privacy. We formulate the problem as a data-driven graph structure represented by a set of nodes (representing buildings), connected through a route of edges or links using a novel Graph Convolution plus LSTM Neural Network (GCLSTM) which has shown remarkable success in modelling complex patterns. We describe the formulation, model estimation, interpretability and examine the relative performance of our proposed model. We also present an illustrative architecture of the models and apply on real-world WiFi logs collected at the Toronto Metropolitan University campus. The results of the experiments show that the integrated GCLSTM models significantly outperform traditional pedestrian flow estimators like the Multi Layer Perceptron (MLP) and Linear Regression.

LGJul 1, 2025
Quantum Machine Learning in Transportation: A Case Study of Pedestrian Stress Modelling

Bara Rababah, Bilal Farooq

Quantum computing has opened new opportunities to tackle complex machine learning tasks, for instance, high-dimensional data representations commonly required in intelligent transportation systems. We explore quantum machine learning to model complex skin conductance response (SCR) events that reflect pedestrian stress in a virtual reality road crossing experiment. For this purpose, Quantum Support Vector Machine (QSVM) with an eight-qubit ZZ feature map and a Quantum Neural Network (QNN) using a Tree Tensor Network ansatz and an eight-qubit ZZ feature map, were developed on Pennylane. The dataset consists of SCR measurements along with features such as the response amplitude and elapsed time, which have been categorized into amplitude-based classes. The QSVM achieved good training accuracy, but had an overfitting problem, showing a low test accuracy of 45% and therefore impacting the reliability of the classification model. The QNN model reached a higher test accuracy of 55%, making it a better classification model than the QSVM and the classic versions.

LGDec 23, 2024
A Coalition Game for On-demand Multi-modal 3D Automated Delivery System

Farzan Moosavi, Bilal Farooq

We introduce a multi-modal autonomous delivery optimization framework as a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments, including high-density areas and time-critical applications. The problem is defined as multiple depot pickup and delivery with time windows constrained over operational restrictions, such as vehicle battery limitation, precedence time window, and building obstruction. Utilizing the coalition game theory, we investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency. To do so, a generalized reinforcement learning model is designed to evaluate the cost-sharing and allocation to different modes to learn the cooperative behaviour with respect to various realistic scenarios. Our methodology leverages an end-to-end deep multi-agent policy gradient method augmented by a novel spatio-temporal adjacency neighbourhood graph attention network using a heterogeneous edge-enhanced attention model and transformer architecture. Several numerical experiments on last-mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga, which shows that despite the incorporation of an extensive network in the graph for two modes and a complex training structure, the model addresses realistic operational constraints and achieves high-quality solutions compared with the existing transformer-based and classical methods. It can perform well on non-homogeneous data distribution, generalizes well on different scales and configurations, and demonstrates a robust cooperative performance under stochastic scenarios across various tasks, which is effectively reflected by coalition analysis and cost allocation to signify the advantage of cooperation.

HCNov 22, 2021
Analysis of pedestrian stress level using GSR sensor in virtual immersive reality

Mahwish Mudassar, Arash Kalatian, Bilal Farooq

Level of emotional arousal of one's body changes in response to external stimuli in an environment. Given the risks involved while crossing streets, particularly at unsignalized mid-block crosswalks, one can expect a change in the stress level of pedestrians. In this study, we investigate the levels and changes in pedestrian stress, under different road crossing scenarios in immersive virtual reality. To measure the stress level of pedestrians, we used Galvanic Skin Response (GSR) sensors. To collect the required data for the model, Virtual Immersive Reality Environment (VIRE) tool is used, which enables us to measure participants' stress levels in a controlled environment. The results suggested that the density of vehicles has a positive effect, meaning as the density of vehicles increases, so does the stress level for pedestrians. It was noted that younger pedestrians have a lower amount of stress when crossing as compared to older pedestrians which have higher amounts of stress. Geometric variables have an impact on the stress level of pedestrians. The greater the number of lanes the greater the observed stress, which is due to the crossing distance increasing, while the walking speed remains the same.

LGOct 23, 2021
Multi-task Recurrent Neural Networks to Simultaneously Infer Mode and Purpose in GPS Trajectories

Ali Yazdizadeh, Arash Kalatian, Zachary Patterson et al.

Multi-task learning is assumed as a powerful inference method, specifically, where there is a considerable correlation between multiple tasks, predicting them in an unique framework may enhance prediction results. This research challenges this assumption by developing several single-task models to compare their results against multi-task learners to infer mode and purpose of trip from smartphone travel survey data collected as part of a smartphone-based travel survey. GPS trajectory data along with socio-demographics and destination-related characteristics are fed into a multi-input neural network framework to predict two outputs; mode and purpose. We deployed Recurrent Neural Networks (RNN) that are fed by sequential GPS trajectories. To process the socio-demographics and destination-related characteristics, another neural network, with different embedding and dense layers is used in parallel with RNN layers in a multi-input multi-output framework. The results are compared against the single-task learners that classify mode and purpose independently. We also investigate different RNN approaches such as Long-Short Term Memory (LSTM), Gated Recurrent Units (GRU) and Bi-directional Gated Recurrent Units (Bi-GRU). The best multi-task learner was a Bi-GRU model able to classify mode and purpose with an F1-measures of 84.33% and 78.28%, while the best single-task learner to infer mode of transport was a GRU model that achieved an F1-measure of 86.50%, and the best single-task Bi-GRU purpose detection model that reached an F1-measure of 77.38%. While there's an assumption of higher performance of multi-task over sing-task learners, the results of this study does not hold such an assumption and shows, in the context of mode and trip purpose inference from GPS trajectory data, a multi-task learning approach does not bring any considerable advantage over single-task learners.

LGSep 11, 2021
On the Initial Behavior Monitoring Issues in Federated Learning

Ranwa Al Mallah, Godwin Badu-Marfo, Bilal Farooq

In Federated Learning (FL), a group of workers participate to build a global model under the coordination of one node, the chief. Regarding the cybersecurity of FL, some attacks aim at injecting the fabricated local model updates into the system. Some defenses are based on malicious worker detection and behavioral pattern analysis. In this context, without timely and dynamic monitoring methods, the chief cannot detect and remove the malicious or unreliable workers from the system. Our work emphasize the urgency to prepare the federated learning process for monitoring and eventually behavioral pattern analysis. We study the information inside the learning process in the early stages of training, propose a monitoring process and evaluate the monitoring period required. The aim is to analyse at what time is it appropriate to start the detection algorithm in order to remove the malicious or unreliable workers from the system and optimise the defense mechanism deployment. We tested our strategy on a behavioral pattern analysis defense applied to the FL process of different benchmark systems for text and image classification. Our results show that the monitoring process lowers false positives and false negatives and consequently increases system efficiency by enabling the distributed learning system to achieve better performance in the early stage of training.

HCApr 16, 2021
A context-aware pedestrian trajectory prediction framework for automated vehicles

Arash Kalatian, Bilal Farooq

With the unprecedented shift towards automated urban environments in recent years, a new paradigm is required to study pedestrian behaviour. Studying pedestrian behaviour in futuristic scenarios requires modern data sources that consider both the Automated Vehicle (AV) and pedestrian perspectives. Current open datasets on AVs predominantly fail to account for the latter, as they do not include an adequate number of events and associated details that involve pedestrian and vehicle interactions. To address this issue, we propose using Virtual Reality (VR) data as a complementary resource to current datasets, which can be designed to measure pedestrian behaviour under specific conditions. In this research, we focus on the context-aware pedestrian trajectory prediction framework for automated vehicles at mid-block unsignalized crossings. For this purpose, we develop a novel multi-input network of Long Short-Term Memory (LSTM) and fully connected dense layers. In addition to past trajectories, the proposed framework incorporates pedestrian head orientations and distance to the upcoming vehicles as sequential input data. By merging the sequential data with contextual information of the environment, we train a model to predict the future pedestrian trajectory. Our results show that the prediction error and overfitting to the training data are reduced by considering contextual information in the model. To analyze the application of the methods to real AV data, the proposed framework is trained and applied to pedestrian trajectory extracted from an open-access video dataset. Finally, by implementing a game theory-based model interpretability method, we provide detailed insights and propose recommendations to improve the current automated vehicle sensing systems from a pedestrian-oriented point of view.

CRFeb 26, 2021
Cybersecurity Threats in Connected and Automated Vehicles based Federated Learning Systems

Ranwa Al Mallah, Godwin Badu-Marfo, Bilal Farooq

Federated learning (FL) is a machine learning technique that aims at training an algorithm across decentralized entities holding their local data private. Wireless mobile networks allow users to communicate with other fixed or mobile users. The road traffic network represents an infrastructure-based configuration of a wireless mobile network where the Connected and Automated Vehicles (CAV) represent the communicating entities. Applying FL in a wireless mobile network setting gives rise to a new threat in the mobile environment that is very different from the traditional fixed networks. The threat is due to the intrinsic characteristics of the wireless medium and is caused by the characteristics of the vehicular networks such as high node-mobility and rapidly changing topology. Most cyber defense techniques depend on highly reliable and connected networks. This paper explores falsified information attacks, which target the FL process that is ongoing at the RSU. We identified a number of attack strategies conducted by the malicious CAVs to disrupt the training of the global model in vehicular networks. We show that the attacks were able to increase the convergence time and decrease the accuracy the model. We demonstrate that our attacks bypass FL defense strategies in their primary form and highlight the need for novel poisoning resilience defense mechanisms in the wireless mobile setting of the future road networks.

CRJan 24, 2021
Untargeted Poisoning Attack Detection in Federated Learning via Behavior Attestation

Ranwa Al Mallah, David Lopez, Godwin Badu Marfo et al.

Federated Learning (FL) is a paradigm in Machine Learning (ML) that addresses data privacy, security, access rights and access to heterogeneous information issues by training a global model using distributed nodes. Despite its advantages, there is an increased potential for cyberattacks on FL-based ML techniques that can undermine the benefits. Model-poisoning attacks on FL target the availability of the model. The adversarial objective is to disrupt the training. We propose attestedFL, a defense mechanism that monitors the training of individual nodes through state persistence in order to detect a malicious worker. A fine-grained assessment of the history of the worker permits the evaluation of its behavior in time and results in innovative detection strategies. We present three lines of defense that aim at assessing if the worker is reliable by observing if the node is really training, advancing towards a goal. Our defense exposes an attacker's malicious behavior and removes unreliable nodes from the aggregation process so that the FL process converge faster. Through extensive evaluations and against various adversarial settings, attestedFL increased the accuracy of the model between 12% to 58% under different scenarios such as attacks performed at different stages of convergence, attackers colluding and continuous attacks.

LGDec 29, 2020
A Differentially Private Multi-Output Deep Generative Networks Approach For Activity Diary Synthesis

Godwin Badu-Marfo, Bilal Farooq, Zachary Patterson

In this work, we develop a privacy-by-design generative model for synthesizing the activity diary of the travel population using state-of-art deep learning approaches. This proposed approach extends literature on population synthesis by contributing novel deep learning to the development and application of synthetic travel data while guaranteeing privacy protection for members of the sample population on which the synthetic populations are based. First, we show a complete de-generalization of activity diaries to simulate the socioeconomic features and longitudinal sequences of geographically and temporally explicit activities. Second, we introduce a differential privacy approach to control the level of resolution disclosing the uniqueness of survey participants. Finally, we experiment using the Generative Adversarial Networks (GANs). We evaluate the statistical distributions, pairwise correlations and measure the level of privacy guaranteed on simulated datasets for varying noise. The results of the model show successes in simulating activity diaries composed of multiple outputs including structured socio-economic features and sequential tour activities in a differentially private manner.

AIDec 15, 2020
Smart Mobility Ontology: Current Trends and Future Directions

Ali Yazdizadeh, Bilal Farooq

Ontology is the explicit and formal representation of the concepts in a domain and relations among them. Transportation science is a wide domain dealing with mobility over various complex and interconnected transportation systems, such as land, aviation, and maritime transport, and can take considerable advantage from ontology development. While several studies can be found in the recent literature, there exists a large potential to improve and develop a comprehensive smart mobility ontology. The current chapter aims to present different aspects of ontology development in general, such as ontology development methods, languages, tools, and software. Subsequently, it presents the currently available mobility-related ontologies developed across different domains, such as transportation, smart cities, goods mobility, sensors. Current gaps in the available ontologies are identified, and future directions regarding ontology development are proposed that can incorporate the forthcoming autonomous and connected vehicles, mobility as a service (MaaS), and other disruptive transportation technologies and services.

CRDec 4, 2020
Resilience-by-design in Adaptive Multi-Agent Traffic Control Systems

Ranwa Al Mallah, Talal Halabi, Bilal Farooq

Connected and Autonomous Vehicles (CAVs) with their evolving data gathering capabilities will play a significant role in road safety and efficiency applications supported by Intelligent Transport Systems (ITS), such as Traffic Signal Control (TSC) for urban traffic congestion management. However, their involvement will expand the space of security vulnerabilities and create larger threat vectors. In this paper, we perform the first detailed security analysis and implementation of a new cyber-physical attack category carried out by the network of CAVs against Adaptive Multi-Agent Traffic Signal Control (AMATSC), namely, coordinated Sybil attacks, where vehicles with forged or fake identities try to alter the data collected by the AMATSC algorithms to sabotage their decisions. Consequently, a novel, game-theoretic mitigation approach at the application layer is proposed to minimize the impact of such sophisticated data corruption attacks. The devised minimax game model enables the AMATSC algorithm to generate optimal decisions under a suspected attack, improving its resilience. Extensive experimentation is performed on a traffic dataset provided by the City of Montreal under real-world intersection settings to evaluate the attack impact. Our results improved time loss on attacked intersections by approximately 48.9%. Substantial benefits can be gained from the mitigation, yielding more robust adaptive control of traffic across networked intersections.

CYOct 27, 2020
Interpretable Data-Driven Demand Modelling for On-Demand Transit Services

Nael Alsaleh, Bilal Farooq

In recent years, with the advancements in information and communication technology, different emerging on-demand shared mobility services have been introduced as innovative solutions in the low-density areas, including on-demand transit (ODT), mobility on-demand (MOD) transit, and crowdsourced mobility services. However, due to their infancy, there is a strong need to understand and model the demand for these services. In this study, we developed trip production and distribution models for ODT services at Dissemination areas (DA) level using four machine learning algorithms: Random Forest (RF), Bagging, Artificial Neural Network (ANN) and Deep Neural Network (DNN). The data used in the modelling process were acquired from Belleville's ODT operational data and 2016 census data. Bayesian optimalization approach was used to find the optimal architecture of the adopted algorithms. Moreover, post-hoc model was employed to interpret the predictions and examine the importance of the explanatory variables. The results showed that the land-use type was the most important variable in the trip production model. On the other hand, the demographic characteristics of the trip destination were the most important variables in the trip distribution model. Moreover, the results revealed that higher trip distribution levels are expected between dissemination areas with commercial/industrial land-use type and dissemination areas with high-density residential land-use. Our findings suggest that the performance of ODT services can be further enhanced by (a) locating idle vehicles in the neighbourhoods with commercial/industrial land-use and (b) using the spatio-temporal demand models obtained in this work to continuously update the operating fleet size.

HCJul 18, 2020
Applications of brain imaging methods in driving behaviour research

Milad Haghani, Michiel C. J. Bliemer, Bilal Farooq et al.

Applications of neuroimaging methods have substantially contributed to the scientific understanding of human factors during driving by providing a deeper insight into the neuro-cognitive aspects of driver brain. This has been achieved by conducting simulated (and occasionally, field) driving experiments while collecting driver brain signals of certain types. Here, this sector of studies is comprehensively reviewed at both macro and micro scales. Different themes of neuroimaging driving behaviour research are identified and the findings within each theme are synthesised. The surveyed literature has reported on applications of four major brain imaging methods. These include Functional Magnetic Resonance Imaging (fMRI), Electroencephalography (EEG), Functional Near-Infrared Spectroscopy (fNIRS) and Magnetoencephalography (MEG), with the first two being the most common methods in this domain. While collecting driver fMRI signal has been particularly instrumental in studying neural correlates of intoxicated driving (e.g. alcohol or cannabis) or distracted driving, the EEG method has been predominantly utilised in relation to the efforts aiming at development of automatic fatigue/drowsiness detection systems, a topic to which the literature on neuro-ergonomics of driving particularly has shown a spike of interest within the last few years. The survey also reveals that topics such as driver brain activity in semi-automated settings or the brain activity of drivers with brain injuries or chronic neurological conditions have by contrast been investigated to a very limited extent. Further, potential topics in relation to driving behaviour are identified that could benefit from the adoption of neuroimaging methods in future studies.

CRJul 16, 2020
Actor-based Risk Analysis for Blockchains in Smart Mobility

Ranwa Al Mallah, Bilal Farooq

Blockchain technology is a crypto-based secure ledger for data storage and transfer through decentralized, trustless peer-to-peer systems. Despite its advantages, previous studies have shown that the technology is not completely secure against cyber attacks. Thus, it is crucial to perform domain specific risk analysis to measure how viable the attacks are on the system, their impact and consequently the risk exposure. Specifically, in this paper, we carry out an analysis in terms of quantifying the risk associated to an operational multi-layered Blockchain framework for Smart Mobility Data-markets (BSMD). We conduct an actor-based analysis to determine the impact of the attacks. The analysis identified five attack goals and five types of attackers that violate the security of the blockchain system. In the case study of the public permissioned BSMD, we highlight the highest risk factors according to their impact on the victims in terms of monetary, privacy, integrity and trust. Four attack goals represent a risk in terms of economic losses and one attack goal contains many threats that represent a risk that is either unacceptable or undesirable.

NIJul 10, 2020
Prediction of Traffic Flow via Connected Vehicles

Ranwa Al Mallah, Bilal Farooq, Alejandro Quintero

We propose a Short-term Traffic flow Prediction (STP) framework so that transportation authorities take early actions to control flow and prevent congestion. We anticipate flow at future time frames on a target road segment based on historical flow data and innovative features such as real time feeds and trajectory data provided by Connected Vehicles (CV) technology. To cope with the fact that existing approaches do not adapt to variation in traffic, we show how this novel approach allows advanced modelling by integrating into the forecasting of flow, the impact of the various events that CV realistically encountered on segments along their trajectory. We solve the STP problem with a Deep Neural Networks (DNN) in a multitask learning setting augmented by input from CV. Results show that our approach, namely MTL-CV, with an average Root-Mean-Square Error (RMSE) of 0.052, outperforms state-of-the-art ARIMA time series (RMSE of 0.255) and baseline classifiers (RMSE of 0.122). Compared to single task learning with Artificial Neural Network (ANN), ANN had a lower performance, 0.113 for RMSE, than MTL-CV. MTL-CV learned historical similarities between segments, in contrast to using direct historical trends in the measure, because trends may not exist in the measure but do in the similarities.

OCJun 30, 2020
Deep Learning Based Proactive Multi-Objective Eco-Routing Strategies for Connected and Automated Vehicles

Lama Alfaseeh, Bilal Farooq

This study exploits the advancements in information and communication technology (ICT), connected and automated vehicles (CAVs), and sensing, to develop proactive multi-objective eco-routing strategies. For a robust application, several GHG costing approaches are examined. The predictive models for the link level traffic and emission states are developed using long short term memory deep network with exogenous predictors. It is found that proactive routing strategies outperformed the myopic strategies, regardless of the routing objective. Whether myopic or proactive, the multi-objective routing, with travel time and GHG minimization as objectives, outperformed the single objective routing strategies, causing a reduction in the average travel time (TT), average vehicle kilometre travelled (VKT), total GHG and total NOx by 17%, 21%, 18%, and 20%, respectively. Finally, the additional TT and VKT experienced by the vehicles in the network contributed adversely to the amount of GHG and NOx produced in the network.

SPApr 16, 2020
Greenhouse Gas Emission Prediction on Road Network using Deep Sequence Learning

Lama Alfaseeh, Ran Tu, Bilal Farooq et al.

Mitigating the substantial undesirable impact of transportation systems on the environment is paramount. Thus, predicting Greenhouse Gas (GHG) emissions is one of the profound topics, especially with the emergence of intelligent transportation systems (ITS). We develop a deep learning framework to predict link-level GHG emission rate (ER) (in CO2eq gram/second) based on the most representative predictors, such as speed, density, and the GHG ER of previous time steps. In particular, various specifications of the long-short term memory (LSTM) networks with exogenous variables are examined and compared with clustering and the autoregressive integrated moving average (ARIMA) model with exogenous variables. The downtown Toronto road network is used as the case study and highly detailed data are synthesized using a calibrated traffic microsimulation and MOVES. It is found that LSTM specification with speed, density, GHG ER, and in-links speed from three previous minutes performs the best while adopting 2 hidden layers and when the hyper-parameters are systematically tuned. Adopting a 30 second updating interval improves slightly the correlation between true and predicted GHG ERs, but contributes negatively to the prediction accuracy as reflected on the increased root mean square error (RMSE) value. Efficiently predicting GHG emissions at a higher frequency with lower data requirements will pave the way to non-myopic eco-routing on large-scale road networks {to alleviate the adverse impact on the global warming

LGApr 15, 2020
Composite Travel Generative Adversarial Networks for Tabular and Sequential Population Synthesis

Godwin Badu-Marfo, Bilal Farooq, Zachary Paterson

Agent-based transportation modelling has become the standard to simulate travel behaviour, mobility choices and activity preferences using disaggregate travel demand data for entire populations, data that are not typically readily available. Various methods have been proposed to synthesize population data for this purpose. We present a Composite Travel Generative Adversarial Network (CTGAN), a novel deep generative model to estimate the underlying joint distribution of a population, that is capable of reconstructing composite synthetic agents having tabular (e.g. age and sex) as well as sequential mobility data (e.g. trip trajectory and sequence). The CTGAN model is compared with other recently proposed methods such as the Variational Autoencoders (VAE) method, which has shown success in high dimensional tabular population synthesis. We evaluate the performance of the synthesized outputs based on distribution similarity, multi-variate correlations and spatio-temporal metrics. The results show the consistent and accurate generation of synthetic populations and their tabular and spatially sequential attributes, generated over varying spatial scales and dimensions.

HCFeb 18, 2020
Decoding pedestrian and automated vehicle interactions using immersive virtual reality and interpretable deep learning

Arash Kalatian, Bilal Farooq

To ensure pedestrian friendly streets in the era of automated vehicles, reassessment of current policies, practices, design, rules and regulations of urban areas is of importance. This study investigates pedestrian crossing behaviour, as an important element of urban dynamics that is expected to be affected by the presence of automated vehicles. For this purpose, an interpretable machine learning framework is proposed to explore factors affecting pedestrians' wait time before crossing mid-block crosswalks in the presence of automated vehicles. To collect rich behavioural data, we developed a dynamic and immersive virtual reality experiment, with 180 participants from a heterogeneous population in 4 different locations in the Greater Toronto Area (GTA). Pedestrian wait time behaviour is then analyzed using a data-driven Cox Proportional Hazards (CPH) model, in which the linear combination of the covariates is replaced by a flexible non-linear deep neural network. The proposed model achieved a 5% improvement in goodness of fit, but more importantly, enabled us to incorporate a richer set of covariates. A game theoretic based interpretability method is used to understand the contribution of different covariates to the time pedestrians wait before crossing. Results show that the presence of automated vehicles on roads, wider lane widths, high density on roads, limited sight distance, and lack of walking habits are the main contributing factors to longer wait times. Our study suggested that, to move towards pedestrian-friendly urban areas, national level educational programs for children, enhanced safety measures for seniors, promotion of active modes of transportation, and revised traffic rules and regulations should be considered.

CRAug 9, 2019
Privacy-Aware Distributed Mobility Choice Modelling over Blockchain

David Lopez, Bilal Farooq

A generalized distributed tool for mobility choice modelling is presented, where participants do not share personal raw data, while all computations are done locally. Participants use Blockchain based Smart Mobility Data-market (BSMD), where all transactions are secure and private. Nodes in blockchain can transact information with other participants as long as both parties agree to the transaction rules issued by the owner of the data. A case study is presented where a mode choice model is distributed and estimated over BSMD. As an example, the parameter estimation problem is solved on a distributed version of simulated annealing. It is demonstrated that the estimated model parameters are consistent and reproducible.

EMJul 16, 2019
Information processing constraints in travel behaviour modelling: A generative learning approach

Melvin Wong, Bilal Farooq

Travel decisions tend to exhibit sensitivity to uncertainty and information processing constraints. These behavioural conditions can be characterized by a generative learning process. We propose a data-driven generative model version of rational inattention theory to emulate these behavioural representations. We outline the methodology of the generative model and the associated learning process as well as provide an intuitive explanation of how this process captures the value of prior information in the choice utility specification. We demonstrate the effects of information heterogeneity on a travel choice, analyze the econometric interpretation, and explore the properties of our generative model. Our findings indicate a strong correlation with rational inattention behaviour theory, which suggest that individuals may ignore certain exogenous variables and rely on prior information for evaluating decisions under uncertainty. Finally, the principles demonstrated in this study can be formulated as a generalized entropy and utility based multinomial logit model.

LGApr 18, 2019
Ensemble Convolutional Neural Networks for Mode Inference in Smartphone Travel Survey

Ali Yazdizadeh, Zachary Patterson, Bilal Farooq

We develop ensemble Convolutional Neural Networks (CNNs) to classify the transportation mode of trip data collected as part of a large-scale smartphone travel survey in Montreal, Canada. Our proposed ensemble library is composed of a series of CNN models with different hyper-parameter values and CNN architectures. In our final model, we combine the output of CNN models using "average voting", "majority voting" and "optimal weights" methods. Furthermore, we exploit the ensemble library by deploying a Random Forest model as a meta-learner. The ensemble method with random forest as meta-learner shows an accuracy of 91.8% which surpasses the other three ensemble combination methods, as well as other comparable models reported in the literature. The "majority voting" and "optimal weights" combination methods result in prediction accuracy rates around 89%, while "average voting" is able to achieve an accuracy of only 85%.

HCApr 16, 2019
DeepWait: Pedestrian Wait Time Estimation in Mixed Traffic Conditions Using Deep Survival Analysis

Arash Kalatian, Bilal Farooq

Pedestrian's road crossing behaviour is one of the important aspects of urban dynamics that will be affected by the introduction of autonomous vehicles. In this study we introduce DeepSurvival, a novel framework for estimating pedestrian's waiting time at unsignalized mid-block crosswalks in mixed traffic conditions. We exploit the strengths of deep learning in capturing the nonlinearities in the data and develop a cox proportional hazard model with a deep neural network as the log-risk function. An embedded feature selection algorithm for reducing data dimensionality and enhancing the interpretability of the network is also developed. We test our framework on a dataset collected from 160 participants using an immersive virtual reality environment. Validation results showed that with a C-index of 0.64 our proposed framework outperformed the standard cox proportional hazard-based model with a C-index of 0.58.

ROApr 15, 2019
Multi-Objective Autonomous Braking System using Naturalistic Dataset

Rafael Vasquez, Bilal Farooq

A deep reinforcement learning based multi-objective autonomous braking system is presented. The design of the system is formulated in a continuous action space and seeks to maximize both pedestrian safety and perception as well as passenger comfort. The vehicle agent is trained against a large naturalistic dataset containing pedestrian road-crossing trials in which respondents walked across a road under various traffic conditions within an interactive virtual reality environment. The policy for brake control is learned through computer simulation using two reinforcement learning methods i.e. Proximal Policy Optimization and Deep Deterministic Policy Gradient and the efficiency of each are compared. Results show that the system is able to reduce the negative influence on passenger comfort by half while maintaining safe braking operation.

HCMar 28, 2019
Analysis of distracted pedestrians' waiting time: Head-Mounted Immersive Virtual Reality application

Arash Kalatian, Anae Sobhani, Bilal Farooq

This paper analyzes the distracted pedestrians' waiting time before crossing the road in three conditions: 1) not distracted, 2) distracted with a smartphone and 3) distracted with a smartphone in the presence of virtual flashing LED lights on the crosswalk as a safety measure. For the means of data collection, we adapted an in-house developed virtual immersive reality environment (VIRE). A total of 42 volunteers participated in the experiment. Participants' positions and head movements were recorded and used to calculate walking speeds, acceleration and deceleration rates, surrogate safety measures, time spent playing smartphone game, etc. After a descriptive analysis on the data, the effects of these variables on pedestrians' waiting time are analyzed by employing a cox proportional hazard model. Several factors were identified as having impact on waiting time. The results show that an increase in initial walk speed, percentage of time the head was oriented toward smartphone during crossing, bigger minimum missed gaps and unsafe crossings resulted in shorter waiting times. On the other hand, an increase in the percentage of time the head was oriented toward smartphone during waiting time, crossing time and maze solving time, means longer waiting times for participants.

LGFeb 27, 2019
Semi-supervised GANs to Infer Travel Modes in GPS Trajectories

Ali Yazdizadeh, Zachary Patterson, Bilal Farooq

Semi-supervised Generative Adversarial Networks (GANs) are developed in the context of travel mode inference with uni-dimensional smartphone trajectory data. We use data from a large-scale smartphone travel survey in Montreal, Canada. We convert GPS trajectories into fixed-sized segments with five channels (variables). We develop different GANs architectures and compare their prediction results with Convolutional Neural Networks (CNNs). The best semi-supervised GANs model led to a prediction accuracy of 83.4%, while the best CNN model was able to achieve the prediction accuracy of 81.3%. The results compare favorably with previous studies, especially when taking the large-scale real-world nature of the dataset into account.

LGFeb 17, 2019
A semi-supervised deep residual network for mode detection in Wi-Fi signals

Arash Kalatian, Bilal Farooq

Due to their ubiquitous and pervasive nature, Wi-Fi networks have the potential to collect large-scale, low-cost, and disaggregate data on multimodal transportation. In this study, we develop a semi-supervised deep residual network (ResNet) framework to utilize Wi-Fi communications obtained from smartphones for the purpose of transportation mode detection. This framework is evaluated on data collected by Wi-Fi sensors located in a congested urban area in downtown Toronto. To tackle the intrinsic difficulties and costs associated with labelled data collection, we utilize ample amount of easily collected low-cost unlabelled data by implementing the semi-supervised part of the framework. By incorporating a ResNet architecture as the core of the framework, we take advantage of the high-level features not considered in the traditional machine learning frameworks. The proposed framework shows a promising performance on the collected data, with a prediction accuracy of 81.8% for walking, 82.5% for biking and 86.0% for the driving mode.

CRJan 22, 2019
Perturbation Methods for Protection of Sensitive Location Data: Smartphone Travel Survey Case Study

Godwin Badu-Marfo, Bilal Farooq, Zachary Patterson

Smartphone based travel data collection has become an important tool for the analysis of transportation systems. Interest in sharing travel survey data has gained popularity in recent years as "Open Data Initiatives" by governments seek to allow the public to use these data, and hopefully be able to contribute their findings and analysis to the public sphere. The public release of such precise information, particularly location data such as place of residence, opens the risk of privacy violation. At the same time, in order for such data to be useful, as much spatial resolution as possible is desirable for utility in transportation applications and travel demand modeling. This paper evaluates geographic random perturbation methods (i.e. Geo-indistinguishability and the Donut geomask) in protecting the privacy of respondents whose residential location may be published. We measure the performance of location privacy methods, preservation of utility and randomness in the distribution of perturbation distances with varying parameters. It is found that both methods produce distributions of spatial perturbations that conform closely to common probability distributions and as a result, that the original locations can be inferred with little information and a high degree of precision. It is also found that while Achieved K-estimate anonymity increases linearly with desired anonymity for the Donut geomask, Geo-Indistinguishability is highly dependent upon its privacy budget factor (epsilon) and is not very effective at assuring desired Achieved K-estimate anonymity.

HCJan 22, 2019
Virtual Immersive Reality based Analysis of Behavioral Responses in Connected and Autonomous Vehicle Environment

Shadi Djavadian, Bilal Farooq, Rafael Vasquez et al.

Recently, we developed a dynamic distributed end-to-end vehicle routing system (E2ECAV) using a network of intelligent intersections and level 5 CAVs (Djavadian & Farooq, 2018). The case study of the downtown Toronto Network showed that E2ECAV has the ability to maximize throughput and reduce travel time up to 40%. However, the efficiency of these new technologies relies on the acceptance of users in adapting to them and their willingness to give control fully or partially to CAVs. In this study a stated preference laboratory experiment is designed employing Virtual Reality Immersive Environment (VIRE) driving simulator to evaluate the behavioral response of drivers to E2ECAV. The aim is to investigate under what conditions drivers are more willing to adapt. The results show that factors such as locus of control, congestion level and ability to multi-task have significant impact.

MLJan 18, 2019
A bi-partite generative model framework for analyzing and simulating large scale multiple discrete-continuous travel behaviour data

Melvin Wong, Bilal Farooq

The emergence of data-driven demand analysis has led to the increased use of generative modelling to learn the probabilistic dependencies between random variables. Although their apparent use has mostly been limited to image recognition and classification in recent years, generative machine learning algorithms can be a powerful tool for travel behaviour research by replicating travel behaviour by the underlying properties of data structures. In this paper, we examine the use of generative machine learning approach for analyzing multiple discrete-continuous (MDC) travel behaviour data. We provide a plausible perspective of how we can exploit the use of machine learning techniques to interpret the underlying heterogeneities in the data. We show that generative models are conceptually similar to the choice selection behaviour process through information entropy and variational Bayesian inference. Without loss of generality, we consider a restricted Boltzmann machine (RBM) based algorithm with multiple discrete-continuous layers, formulated as a variational Bayesian inference optimization problem. We systematically describe the proposed machine learning algorithm and develop a process of analyzing travel behaviour data from a generative learning perspective. We show parameter stability from model analysis and simulation tests on an open dataset with multiple discrete-continuous dimensions from a data size of 293,330 observations. For interpretability, we derive the conditional probabilities, elasticities and perform statistical analysis on the latent variables. We show that our model can generate statistically similar data distributions for travel forecasting and prediction and performs better than purely discriminative methods in validation. Our results indicate that latent constructs in generative models can accurately represent the joint distribution consistently on MDC data.

LGSep 16, 2018
Mobility Mode Detection Using WiFi Signals

Arash Kalatian, Bilal Farooq

We utilize Wi-Fi communications from smartphones to predict their mobility mode, i.e. walking, biking and driving. Wi-Fi sensors were deployed at four strategic locations in a closed loop on streets in downtown Toronto. Deep neural network (Multilayer Perceptron) along with three decision tree based classifiers (Decision Tree, Bagged Decision Tree and Random Forest) are developed. Results show that the best prediction accuracy is achieved by Multilayer Perceptron, with 86.52% correct predictions of mobility modes.

LGSep 15, 2018
Modelling Latent Travel Behaviour Characteristics with Generative Machine Learning

Melvin Wong, Bilal Farooq

In this paper, we implement an information-theoretic approach to travel behaviour analysis by introducing a generative modelling framework to identify informative latent characteristics in travel decision making. It involves developing a joint tri-partite Bayesian graphical network model using a Restricted Boltzmann Machine (RBM) generative modelling framework. We apply this framework on a mode choice survey data to identify abstract latent variables and compare the performance with a traditional latent variable model with specific latent preferences -- safety, comfort, and environmental. Data collected from a joint stated and revealed preference mode choice survey in Quebec, Canada were used to calibrate the RBM model. Results show that a signficant impact on model likelihood statistics and suggests that machine learning tools are highly suitable for modelling complex networks of conditional independent behaviour interactions.

HCJun 17, 2018
Impact of Smartphone Distraction on Pedestrians' Crossing Behaviour: An Application of Head-Mounted Immersive Virtual Reality

Anae Sobhani, Bilal Farooq

A novel head-mounted virtual immersive/interactive reality environment (VIRE) is utilized to evaluate the behaviour of participants in three pedestrian road crossing conditions while 1) not distracted, 2) distracted with a smartphone, and 3) distracted with a smartphone with a virtually implemented safety measure on the road. Forty-two volunteers participated in our research who completed thirty successful (complete crossing) trials in blocks of ten trials for each crossing condition. For the two distracted conditions, pedestrians are engaged in a maze-solving game on a virtual smartphone, while at the same time checking the traffic for a safe crossing gap. For the proposed safety measure, smart flashing and color changing LED lights are simulated on the crosswalk to warn the distracted pedestrian who initiates crossing. Surrogate safety measures as well as speed information and distraction attributes such as direction and orientation of participant's head were collected and evaluated by employing a Multinomial Logit (MNL) model. Results from the model indicate that females have more dangerous crossing behaviour especially in distracted conditions; however, the smart LED treatment reduces this negative impact. Moreover, the number of times and the percentage of duration the head was facing the smartphone during a trial and a waiting time respectively increase the possibility of unsafe crossings; though, the proposed treatment reduces the safety crossing rate. Hence, our study shows that the smart LED light safety treatment indeed improves the safety of distracted pedestrians and enhances the successful crossing rate.

HCFeb 17, 2018
Virtual Immersive Reality for Stated Preference Travel Behaviour Experiments: A Case Study of Autonomous Vehicles on Urban Roads

Bilal Farooq, Elisabetta Cherchi, Anae Sobhani

Stated preference experiments have been known to suffer from the lack of realism. This issue is particularly visible when the scenario doesn't have a well understood prior reference e.g. in case of the autonomous vehicles related scenarios. We present Virtual Immersive Reality Environment (VIRE) that is capable of developing highly realistic, immersive, and interactive choice scenario. We demonstrate the use of VIRE in the pedestrian preferences related to autonomous vehicles and associated infrastructure changes on urban streets of Montréal. The results are compared with predominantly used approaches i.e. text-only and visual aid. We show that VIRE results in better understanding of the scenario and consistent results.

LGJun 1, 2017
Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

Melvin Wong, Bilal Farooq, Guillaume-Alexandre Bilodeau

Conventional methods of estimating latent behaviour generally use attitudinal questions which are subjective and these survey questions may not always be available. We hypothesize that an alternative approach can be used for latent variable estimation through an undirected graphical models. For instance, non-parametric artificial neural networks. In this study, we explore the use of generative non-parametric modelling methods to estimate latent variables from prior choice distribution without the conventional use of measurement indicators. A restricted Boltzmann machine is used to represent latent behaviour factors by analyzing the relationship information between the observed choices and explanatory variables. The algorithm is adapted for latent behaviour analysis in discrete choice scenario and we use a graphical approach to evaluate and understand the semantic meaning from estimated parameter vector values. We illustrate our methodology on a financial instrument choice dataset and perform statistical analysis on parameter sensitivity and stability. Our findings show that through non-parametric statistical tests, we can extract useful latent information on the behaviour of latent constructs through machine learning methods and present strong and significant influence on the choice process. Furthermore, our modelling framework shows robustness in input variability through sampling and validation.

LGMar 7, 2017
An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service

Ismaïl Saadi, Melvin Wong, Bilal Farooq et al.

In this paper, we present machine learning approaches for characterizing and forecasting the short-term demand for on-demand ride-hailing services. We propose the spatio-temporal estimation of the demand that is a function of variable effects related to traffic, pricing and weather conditions. With respect to the methodology, a single decision tree, bootstrap-aggregated (bagged) decision trees, random forest, boosted decision trees, and artificial neural network for regression have been adapted and systematically compared using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and slope. To better assess the quality of the models, they have been tested on a real case study using the data of DiDi Chuxing, the main on-demand ride hailing service provider in China. In the current study, 199,584 time-slots describing the spatio-temporal ride-hailing demand has been extracted with an aggregated-time interval of 10 mins. All the methods are trained and validated on the basis of two independent samples from this dataset. The results revealed that boosted decision trees provide the best prediction accuracy (RMSE=16.41), while avoiding the risk of over-fitting, followed by artificial neural network (20.09), random forest (23.50), bagged decision trees (24.29) and single decision tree (33.55).