CRApr 25
Training Machine Learning Models on Encrypted Data: A Privacy-Preserving Framework using Homomorphic EncryptionAlexandre Marques, Beatriz Sá, Rui Botelho et al.
The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.
CLJun 18, 2025
From Model to Classroom: Evaluating Generated MCQs for Portuguese with Narrative and Difficulty ConcernsBernardo Leite, Henrique Lopes Cardoso, Pedro Pinto et al.
While MCQs are valuable for learning and evaluation, manually creating them with varying difficulty levels and targeted reading skills remains a time-consuming and costly task. Recent advances in generative AI provide an opportunity to automate MCQ generation efficiently. However, assessing the actual quality and reliability of generated MCQs has received limited attention -- particularly regarding cases where generation fails. This aspect becomes particularly important when the generated MCQs are meant to be applied in real-world settings. Additionally, most MCQ generation studies focus on English, leaving other languages underexplored. This paper investigates the capabilities of current generative models in producing MCQs for reading comprehension in Portuguese, a morphologically rich language. Our study focuses on generating MCQs that align with curriculum-relevant narrative elements and span different difficulty levels. We evaluate these MCQs through expert review and by analyzing the psychometric properties extracted from student responses to assess their suitability for elementary school students. Our results show that current models can generate MCQs of comparable quality to human-authored ones. However, we identify issues related to semantic clarity and answerability. Also, challenges remain in generating distractors that engage students and meet established criteria for high-quality MCQ option design.
SEMay 7, 2015
Fault Detection in C Programs using Monitoring of Range Values: Preliminary ResultsPedro Pinto, Rui Abreu, João M. P. Cardoso
This technical report presents the work done as part of the AutoSeer project. Our work in this project was to develop a source-to-source compiler, MANET, for the C language that could be used for instrumentation of critical parts of applications under testing. The intention was to guide the compilation flow and define instrumentation strategies using the Aspect-Oriented Approach provided by LARA. This allows a separation of the original target application and the instrumentation secondary concerns. One of the goals of this work was the development of a source-to-source C compiler that modifies code according to an input strategy. These modifications could provide code transformations that target performance and instrumentation for debugging, but in this work they are used to inject code that collects information about the values that certain variables take during runtime. This compiler is supported by an AOP approach that enables the definition of instrumentation strategies. We decided to extend an existing source-to-source compiler, Cetus, and couple it with LARA, an AOP language that is partially abstracted from the target programming language. We propose and evaluate an approach to detect faults in C programs by monitoring the range values of variables. We consider various monitoring strategies and use two real-life applications, the GZIP file compressor and ABS, a program provided by an industrial partner. The different strategies were specified in LARA and automatically applied using MANET. The experimental results show that our approach has potential but is hindered by not accounting for values in arrays and control variables. We achieve prediction accuracies of around 54% for ABS and 83% for GZIP, when comparing our approach to a more traditional one, where the outputs are compared to an expected result.