LGMay 5, 2023
Open problems in causal structure learning: A case study of COVID-19 in the UKAnthony Constantinou, Neville K. Kitson, Yang Liu et al.
Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online.
LGApr 9, 2020
Learning Bayesian Networks that enable full propagation of evidenceAnthony Constantinou
This paper builds on recent developments in Bayesian network (BN) structure learning under the controversial assumption that the input variables are dependent. This assumption can be viewed as a learning constraint geared towards cases where the input variables are known or assumed to be dependent. It addresses the problem of learning multiple disjoint subgraphs that do not enable full propagation of evidence. This problem is highly prevalent in cases where the sample size of the input data is low with respect to the dimensionality of the model, which is often the case when working with real data. The paper presents a novel hybrid structure learning algorithm, called SaiyanH, that addresses this issue. The results show that this constraint helps the algorithm to estimate the number of true edges with higher accuracy compared to the state-of-the-art. Out of the 13 algorithms investigated, the results rank SaiyanH 4th in reconstructing the true DAG, with accuracy scores lower by 8.1% (F1), 10.2% (BSF), and 19.5% (SHD) compared to the top ranked algorithm, and higher by 75.5% (F1), 118% (BSF), and 4.3% (SHD) compared to the bottom ranked algorithm. Overall, the results suggest that the proposed algorithm discovers satisfactorily accurate connected DAGs in cases where other algorithms produce multiple disjoint subgraphs that often underfit the true graph.
APMar 10, 2020
Investigating the efficiency of the Asian handicap football betting market with ratings and Bayesian networksAnthony Constantinou
Despite the massive popularity of the Asian Handicap (AH) football (soccer) betting market, its efficiency has not been adequately studied by the relevant literature. This paper combines rating systems with Bayesian networks and presents the first published model specifically developed for prediction and assessment of the efficiency of the AH betting market. The results are based on 13 English Premier League seasons and are compared to the traditional market, where the bets are for win, lose or draw. Different betting situations have been examined including a) both average and maximum (best available) market odds, b) all possible betting decision thresholds between predicted and published odds, c) optimisations for both return-on-investment and profit, and d) simple stake adjustments to investigate how the variance of returns changes when targeting equivalent profit in both traditional and AH markets. While the AH market is found to share the inefficiencies of the traditional market, the findings reveal both interesting differences as well as similarities between the two.
MEDec 3, 2019
Simpson's Paradox and the implications for medical trialsNorman Fenton, Martin Neil, Anthony Constantinou
This paper describes Simpson's paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly points to one conclusion (such as 'drug is effective') for every possible combination of the variable states, but when a previously unobserved confounding variable is included every possible combination of the variables state points to the opposite conclusion ('drug is not effective'). In other words no matter how many variables are considered, and no matter how 'conclusive' the result, one cannot conclude the result is truly 'valid' since there is theoretically an unobserved confounding variable that could completely reverse the result.