SEMay 2, 2022
PSI Draft SpecificationMark Reid, James Montgomery, Barry Drake et al.
This document presents the draft specification for delivering machine learning services over HTTP, developed as part of the Protocols and Structures for Inference project, which concluded in 2013. It presents the motivation for providing machine learning as a service, followed by a description of the essential and optional components of such a service.
LGJan 21
Early predicting of hospital admission using machine learning algorithms: Priority queues approachJakub Antczak, James Montgomery, Małgorzata O'Reilly et al.
Emergency Department overcrowding is a critical issue that compromises patient safety and operational efficiency, necessitating accurate demand forecasting for effective resource allocation. This study evaluates and compares three distinct predictive models: Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors (SARIMAX), EXtreme Gradient Boosting (XGBoost) and Long Short-Term Memory (LSTM) networks for forecasting daily ED arrivals over a seven-day horizon. Utilizing data from an Australian tertiary referral hospital spanning January 2017 to December 2021, this research distinguishes itself by decomposing demand into eight specific ward categories and stratifying patients by clinical complexity. To address data distortions caused by the COVID-19 pandemic, the study employs the Prophet model to generate synthetic counterfactual values for the anomalous period. Experimental results demonstrate that all three proposed models consistently outperform a seasonal naive baseline. XGBoost demonstrated the highest accuracy for predicting total daily admissions with a Mean Absolute Error of 6.63, while the statistical SARIMAX model proved marginally superior for forecasting major complexity cases with an MAE of 3.77. The study concludes that while these techniques successfully reproduce regular day-to-day patterns, they share a common limitation in underestimating sudden, infrequent surges in patient volume.
LGAug 6, 2025
Leveraging Deep Learning for Physical Model Bias of Global Air Quality EstimatesKelsey Doerksen, Yuliya Marchetti, Kevin Bowman et al.
Air pollution is the world's largest environmental risk factor for human disease and premature death, resulting in more than 6 million permature deaths in 2019. Currently, there is still a challenge to model one of the most important air pollutants, surface ozone, particularly at scales relevant for human health impacts, with the drivers of global ozone trends at these scales largely unknown, limiting the practical use of physics-based models. We employ a 2D Convolutional Neural Network based architecture that estimate surface ozone MOMO-Chem model residuals, referred to as model bias. We demonstrate the potential of this technique in North America and Europe, highlighting its ability better to capture physical model residuals compared to a traditional machine learning method. We assess the impact of incorporating land use information from high-resolution satellite imagery to improve model estimates. Importantly, we discuss how our results can improve our scientific understanding of the factors impacting ozone bias at urban scales that can be used to improve environmental policy.
LGAug 6, 2025
Uncertainty Quantification for Surface Ozone Emulators using Deep LearningKelsey Doerksen, Yuliya Marchetti, Steven Lu et al.
Air pollution is a global hazard, and as of 2023, 94\% of the world's population is exposed to unsafe pollution levels. Surface Ozone (O3), an important pollutant, and the drivers of its trends are difficult to model, and traditional physics-based models fall short in their practical use for scales relevant to human-health impacts. Deep Learning-based emulators have shown promise in capturing complex climate patterns, but overall lack the interpretability necessary to support critical decision making for policy changes and public health measures. We implement an uncertainty-aware U-Net architecture to predict the Multi-mOdel Multi-cOnstituent Chemical data assimilation (MOMO-Chem) model's surface ozone residuals (bias) using Bayesian and quantile regression methods. We demonstrate the capability of our techniques in regional estimation of bias in North America and Europe for June 2019. We highlight the uncertainty quantification (UQ) scores between our two UQ methodologies and discern which ground stations are optimal and sub-optimal candidates for MOMO-Chem bias correction, and evaluate the impact of land-use information in surface ozone residual modeling.
CRJun 20, 2018
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and TaxonomyErfan Aghasian, Saurabh Garg, James Montgomery
Recommender systems have become an integral part of many social networks and extract knowledge from a user's personal and sensitive data both explicitly, with the user's knowledge, and implicitly. This trend has created major privacy concerns as users are mostly unaware of what data and how much data is being used and how securely it is used. In this context, several works have been done to address privacy concerns for usage in online social network data and by recommender systems. This paper surveys the main privacy concerns, measurements and privacy-preserving techniques used in large-scale online social networks and recommender systems. It is based on historical works on security, privacy-preserving, statistical modeling, and datasets to provide an overview of the technical difficulties and problems associated with privacy preserving in online social networks.
SDApr 16, 2018
Automatic Rain and Cicada Chorus Filtering of Bird Acoustic DataAlexander Brown, Saurabh Garg, James Montgomery
Recording and analysing environmental audio recordings has become a common approach for monitoring the environment. A current problem with performing analyses of environmental recordings is interference from noise that can mask sounds of interest. This makes detecting these sounds more difficult and can require additional resources. While some work has been done to remove stationary noise from environmental recordings, there has been little effort to remove noise from non-stationary sources, such as rain, wind, engines, and animal vocalisations that are not of interest. In this paper, we address the challenge of filtering noise from rain and cicada choruses from recordings containing bird sound. We improve upon previously established classification approaches using acoustic indices and Mel Frequency Cepstral Coefficients (MFCCs) as acoustic features to detect these noise sources, approaching the problem with the motivation of removing these sounds. We investigate the use of acoustic indices, and machine learning classifiers to find the most effective filters. The approach we use enables users to set thresholds to increase or decrease the sensitivity of classification, based on the prediction probability outputted by classifiers. We also propose a novel approach to remove cicada choruses using band-pass filters Our threshold-based approach (Random Forest with Acoustic Indices and Mel Frequency Cepstral Coefficients (MFCCs)) for rain detection achieves an AUC of 0.9881 and is more accurate than existing approaches when set to the same sensitivities. We also detect cicada choruses in our training set with 100% accuracy using 10-folds cross validation. Our cicada filtering approach greatly increased the median signal to noise ratios of affected recordings from 0.53 for unfiltered audio to 1.86 to audio filtered by both the cicada filter and a stationary noise filter.
DCFeb 2, 2018
Scalable Preprocessing of High Volume Bird Acoustic DataAlexander Brown, Saurabh Garg, James Montgomery
In this work, we examine the problem of efficiently preprocessing high volume bird acoustic data. We combine several existing preprocessing steps including noise reduction approaches into a single efficient pipeline by examining each process individually. We then utilise a distributed computing architecture to improve execution time. Using a master-slave model with data parallelisation, we developed a near-linear automated scalable system, capable of preprocessing bird acoustic recordings 21.76 times faster with 32 cores over 8 virtual machines, compared to a serial process. This work contributes to the research area of bioacoustic analysis, which is currently very active because of its potential to monitor animals quickly at low cost. Overcoming noise interference is a significant challenge in many bioacoustic studies, and the volume of data in these studies is increasing. Our work makes large scale bird acoustic analyses more feasible by parallelising important bird acoustic processing tasks to significantly reduce execution times.