LGAug 25, 2023
A Bayesian Active Learning Approach to Comparative JudgementAndy Gray, Alma Rahat, Tom Crick et al.
Assessment is a crucial part of education. Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessors. An approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items and is asked to select the better one. Following a series of comparisons, a rank is derived using a ranking model, for example, the BTM, based on the results. While CJ is considered a reliable method for marking, there are concerns around transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used. As a result, a random selection approach is usually deployed. We propose a novel Bayesian approach to CJ (BCJ) for determining the ranks of compared items alongside a new way to select the pairs to present to the marker(s) using active learning (AL), addressing the key shortcomings of traditional CJ. Furthermore, we demonstrate how the entire approach may provide transparency by providing the user insights into how it is making its decisions and, at the same time, being more efficient. Results from our experiments confirm that the proposed BCJ combined with entropy-driven AL pair-selection method is superior to other alternatives. We also find that the more comparisons done, the more accurate BCJ becomes, which solves the issue the current method has of the model deteriorating if too many comparisons are performed. As our approach can generate the complete predicted rank distribution for an item, we also show how this can be utilised in devising a predicted grade, guided by the assessor.
LGMar 1, 2025
Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational AssessmentAndy Gray, Alma Rahat, Tom Crick et al.
Comparative Judgement (CJ) provides an alternative assessment approach by evaluating work holistically rather than breaking it into discrete criteria. This method leverages human ability to make nuanced comparisons, yielding more reliable and valid assessments. CJ aligns with real-world evaluations, where overall quality emerges from the interplay of various elements. However, rubrics remain widely used in education, offering structured criteria for grading and detailed feedback. This creates a gap between CJ's holistic ranking and the need for criterion-based performance breakdowns. This paper addresses this gap using a Bayesian approach. We build on Bayesian CJ (BCJ) by Gray et al., which directly models preferences instead of using likelihoods over total scores, allowing for expected ranks with uncertainty estimation. Their entropy-based active learning method selects the most informative pairwise comparisons for assessors. We extend BCJ to handle multiple independent learning outcome (LO) components, defined by a rubric, enabling both holistic and component-wise predictive rankings with uncertainty estimates. Additionally, we propose a method to aggregate entropies and identify the most informative comparison for assessors. Experiments on synthetic and real data demonstrate our method's effectiveness. Finally, we address a key limitation of BCJ, which is the inability to quantify assessor agreement. We show how to derive agreement levels, enhancing transparency in assessment.
CYMar 17, 2025
Rendering Transparency to Ranking in Educational Assessment via Bayesian Comparative JudgementAndy Gray, Alma Rahat, Stephen Lindsay et al.
Ensuring transparency in educational assessment is increasingly critical, particularly post-pandemic, as demand grows for fairer and more reliable evaluation methods. Comparative Judgement (CJ) offers a promising alternative to traditional assessments, yet concerns remain about its perceived opacity. This paper examines how Bayesian Comparative Judgement (BCJ) enhances transparency by integrating prior information into the judgement process, providing a structured, data-driven approach that improves interpretability and accountability. BCJ assigns probabilities to judgement outcomes, offering quantifiable measures of uncertainty and deeper insights into decision confidence. By systematically tracking how prior data and successive judgements inform final rankings, BCJ clarifies the assessment process and helps identify assessor disagreements. Multi-criteria BCJ extends this by evaluating multiple learning outcomes (LOs) independently, preserving the richness of CJ while producing transparent, granular rankings aligned with specific assessment goals. It also enables a holistic ranking derived from individual LOs, ensuring comprehensive evaluations without compromising detailed feedback. Using a real higher education dataset with professional markers in the UK, we demonstrate BCJ's quantitative rigour and ability to clarify ranking rationales. Through qualitative analysis and discussions with experienced CJ practitioners, we explore its effectiveness in contexts where transparency is crucial, such as high-stakes national assessments. We highlight the benefits and limitations of BCJ, offering insights into its real-world application across various educational settings.
CRJun 23, 2019
A UK Case Study on Cybersecurity Education and AccreditationTom Crick, James H. Davenport, Alastair Irons et al.
This paper presents a national case study-based analysis of the numerous dimensions to cybersecurity education and how they are prioritised, implemented and accredited; from understanding the interaction of hardware and software, moving from theory to practice (and vice versa), to human factors, policy and politics (as well as various other important facets). A multitude of model curricula and recommendations have been presented and discussed in international fora in recent years, with varying levels of impact on education, policy and practice. This paper address three key questions: i) what is taught and what should be taught for cybersecurity to general computer science students; ii) should cybersecurity be taught stand-alone or in an integrated manner to general computer science students; and iii) can accreditation by national professional, statutory and regulatory bodies enhance the provision of cybersecurity within a body's jurisdiction? Evaluating how cybersecurity is taught in all aspects of computer science is clearly a task of considerable size, one that is beyond the scope of this paper. Instead a case study-based research approach -- primarily focusing on the UK -- has been adopted to evaluate the evidence of the teaching of cybersecurity within general computer science to university-level students. Thus, in the context of widespread international computer science/engineering curriculum reform, what does this need to embed cybersecurity knowledge and skills mean more generally for institutions and educators, and how can we teach this subject more effectively? Through this UK case study, and by contrasting with related initiatives in the US, we demonstrate the positive effect that national accreditation requirements can have, and offer some recommendations both for future research and curriculum developments.
HCAug 10, 2016
Incorporating Emotion and Personality-Based Analysis in User-Centered ModellingMohamed Mostafa, Tom Crick, Ana C. Calderon et al.
Understanding complex user behaviour under various conditions, scenarios and journeys can be fundamental to the improvement of the user-experience for a given system. Predictive models of user reactions, responses -- and in particular, emotions -- can aid in the design of more intuitive and usable systems. Building on this theme, the preliminary research presented in this paper correlates events and interactions in an online social network against user behaviour, focusing on personality traits. Emotional context and tone is analysed and modelled based on varying types of sentiments that users express in their language using the IBM Watson Developer Cloud tools. The data collected in this study thus provides further evidence towards supporting the hypothesis that analysing and modelling emotions, sentiments and personality traits provides valuable insight into improving the user experience of complex social computer systems.
SEMar 9, 2015
Reproducibility in Research: Systems, Infrastructure, CultureTom Crick, Benjamin A. Hall, Samin Ishtiaq
The reproduction and replication of research results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the challenges closely revolve around the ability to implement (and exploit) novel algorithms and models. Taking a new approach from the literature and applying it to a new codebase frequently requires local knowledge missing from the published manuscripts and transient project websites. Alongside this issue, benchmarking, and the lack of open, transparent and fair benchmark sets present another barrier to the verification and validation of claimed results. In this paper, we outline several recommendations to address these issues, driven by specific examples from a range of scientific domains. Based on these recommendations, we propose a high-level prototype open automated platform for scientific software development which effectively abstracts specific dependencies from the individual researcher and their workstation, allowing easy sharing and reproduction of results. This new e-infrastructure for reproducible computational science offers the potential to incentivise a culture change and drive the adoption of new techniques to improve the quality and efficiency -- and thus reproducibility -- of scientific exploration.
LOFeb 9, 2015
Dear CAV, We Need to Talk About ReproducibilityTom Crick, Benjamin A. Hall, Samin Ishtiaq
How many times have you tried to re-implement a past CAV tool paper, and failed? Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper we propose an infrastructure for enabling reproducibility in our community, by automating the build, unit testing and benchmarking of research software.
SEJul 22, 2014
"Can I Implement Your Algorithm?": A Model for Reproducible Research SoftwareTom Crick, Benjamin A. Hall, Samin Ishtiaq
The reproduction and replication of novel results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it to a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair --- and widely available --- benchmark sets present another barrier. In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.