47.7SEMay 12
Fine-Tuning Models for Automated Code Review FeedbackSmitha S Kumar, Michael A Lones, Manuel Maarek et al.
Large Language Models have introduced new possibilities for programming education through personalized support, content creation, and automated feedback. While recent studies have demonstrated the potential for feedback generation, many techniques rely on proprietary models, raising concerns about cost, computational demands, and the ethical implications of sharing student code. Open LLMs provide an alternative approach, but they do not currently have the capabilities of proprietary models. To address this problem, we investigate whether parameter-efficient fine-tuning (PEFT) and prompt engineering, both of which distil knowledge from a dataset derived from a large, more capable model, can be used to adapt and enhance the quality of feedback generated by the open LLM Code Llama. Feedback quality on buggy Java code was assessed using a combination of student evaluation, manual annotation and the automated metrics BLEU, ROUGE, and BERTScore. Our findings indicate that PEFT leads to notable improvements in feedback quality and significantly outperforms prompt engineering, providing an avenue for developing freely deployable feedback tools that can be effectively used to guide student learning. Student evaluation indicates that learners value the PEFT model's feedback and see it as being equally effective as the proprietary ChatGPT model. Participants suggested that incorporating additional explanation for technical terms in the PEFT model's feedback could be more beneficial. This study demonstrates that fine-tuned models can effectively support critical thinking and guide the design of scalable pedagogical systems.
CYMay 23, 2025
Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming EducationSmitha Kumar, Michael A. Lones, Manuel Maarek et al.
The rapid advancement of Large Language Models (LLMs) has opened new avenues in education. This study examines the use of LLMs in supporting learning in machine learning education; in particular, it focuses on the ability of LLMs to identify common errors of practice (pitfalls) in machine learning code, and their ability to provide feedback that can guide learning. Using a portfolio of code samples, we consider four different LLMs: one closed model and three open models. Whilst the most basic pitfalls are readily identified by all models, many common pitfalls are not. They particularly struggle to identify pitfalls in the early stages of the ML pipeline, especially those which can lead to information leaks, a major source of failure within applied ML projects. They also exhibit limited success at identifying pitfalls around model selection, which is a concept that students often struggle with when first transitioning from theory to practice. This questions the use of current LLMs to support machine learning education, and also raises important questions about their use by novice practitioners. Nevertheless, when LLMs successfully identify pitfalls in code, they do provide feedback that includes advice on how to proceed, emphasising their potential role in guiding learners. We also compare the capability of closed and open LLM models, and find that the gap is relatively small given the large difference in model sizes. This presents an opportunity to deploy, and potentially customise, smaller more efficient LLM models within education, avoiding risks around cost and data sharing associated with commercial models.
CRSep 29, 2020
Tracking Mixed BitcoinsTin Tironsakkul, Manuel Maarek, Andrea Eross et al.
Mixer services purportedly remove all connections between the input (deposited) Bitcoins and the output (withdrawn) mixed Bitcoins, seemingly rendering taint analysis tracking ineffectual. In this paper, we introduce and explore a novel tracking strategy, called \emph{Address Taint Analysis}, that adapts from existing transaction-based taint analysis techniques for tracking Bitcoins that have passed through a mixer service. We also investigate the potential of combining address taint analysis with address clustering and backward tainting. We further introduce a set of filtering criteria that reduce the number of false-positive results based on the characteristics of withdrawn transactions and evaluate our solution with verifiable mixing transactions of nine mixer services from previous reverse-engineering studies. Our finding shows that it is possible to track the mixed Bitcoins from the deposited Bitcoins using address taint analysis and the number of potential transaction outputs can be significantly reduced with the filtering criteria.
SEAug 13, 2020
Development of a Web Platform for Code Peer-TestingManuel Maarek, Léon McGregor
As part of formative and summative assessments in programming courses, students work on developing programming artifacts following a given specification. These artifacts are evaluated by the teachers. At the end of this evaluation, the students receive feedback and marks. Providing feedback on programming artifacts is time demanding and could make feedback to arrive too late for it to be effective for the students' learning. We propose to combine software testing with peer feedback which has been praised for offering a timely and effective learning activity with program testing. In this paper we report on the development of a Web platform for peer feedback on programming artifacts through program testing. We discuss the development process of our peer-testing platform informed by teachers and students.
CRJun 13, 2019
Probing the Mystery of Cryptocurrency Theft: An Investigation into Methods for Taint AnalysisTin Tironsakkul, Manuel Maarek, Andrea Eross et al.
Since the creation of Bitcoin, transaction tracking is one of the prominent means for following the movement of Bitcoins involved in illegal activities. Although every Bitcoin transaction is recorded in the blockchain database, which is transparent for anyone to observe and analyse, Bitcoin's pseudonymity system and transaction obscuring techniques still allow criminals to disguise their transaction trail. While there have been a few attempts to develop tracking methods, there is no accepted evaluation method to measure their accuracy. Therefore, this paper investigates strategies for transaction tracking by introducing two new tainting methods, and proposes an address profiling approach with a metrics-based evaluation framework. We use our approach and framework to compare the accuracy of our new tainting methods with the previous tainting techniques, using data from two real Bitcoin theft transactions and several related control transactions.
SEApr 26, 2014
Experience in using a typed functional language for the development of a security applicationDamien Doligez, Christèle Faure, Thérèse Hardin et al.
In this paper we present our experience in developing a security application using a typed functional language. We describe how the formal grounding of its semantic and compiler have allowed for a trustworthy development and have facilitated the fulfillment of the security specification.