PLOct 10, 2022
Data types as a more ergonomic frontend for Grammar-Guided Genetic ProgrammingGuilherme Espada, Leon Ingelse, Paulo Canelas et al. · cmu
Genetic Programming (GP) is an heuristic method that can be applied to many Machine Learning, Optimization and Engineering problems. In particular, it has been widely used in Software Engineering for Test-case generation, Program Synthesis and Improvement of Software (GI). Grammar-Guided Genetic Programming (GGGP) approaches allow the user to refine the domain of valid program solutions. Backus Normal Form is the most popular interface for describing Context-Free Grammars (CFG) for GGGP. BNF and its derivatives have the disadvantage of interleaving the grammar language and the target language of the program. We propose to embed the grammar as an internal Domain-Specific Language in the host language of the framework. This approach has the same expressive power as BNF and EBNF while using the host language type-system to take advantage of all the existing tooling: linters, formatters, type-checkers, autocomplete, and legacy code support. These tools have a practical utility in designing software in general, and GP systems in particular. We also present Meta-Handlers, user-defined overrides of the tree-generation system. This technique extends our object-oriented encoding with more practicability and expressive power than existing CFG approaches, achieving the same expressive power of Attribute Grammars, but without the grammar vs target language duality. Furthermore, we evidence that this approach is feasible, showing an example Python implementation as proof. We also compare our approach against textual BNF-representations w.r.t. expressive power and ergonomics. These advantages do not come at the cost of performance, as shown by our empirical evaluation on 5 benchmarks of our example implementation against PonyGE2. We conclude that our approach has better ergonomics with the same expressive power and performance of textual BNF-based grammar encodings.
LGJul 3, 2024
Semantically Rich Local Dataset Generation for Explainable AI in GenomicsPedro Barbosa, Rosina Savisaar, Alcides Fonseca
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. Therefore, interpreting these models may provide novel insights into the underlying biology, supporting downstream biomedical applications. Due to their complexity, interpretable surrogate models can only be built for local explanations (e.g., a single instance). However, accomplishing this requires generating a dataset in the neighborhood of the input, which must maintain syntactic similarity to the original data while introducing semantic variability in the model's predictions. This task is challenging due to the complex sequence-to-function relationship of DNA. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity. Our custom, domain-guided individual representation effectively constrains syntactic similarity, and we provide two alternative fitness functions that promote diversity with no computational effort. Applied to the RNA splicing domain, our approach quickly achieves good diversity and significantly outperforms a random baseline in exploring the search space, as shown by our proof-of-concept, short RNA sequence. Furthermore, we assess its generalizability and demonstrate scalability to larger sequences, resulting in a ~30% improvement over the baseline.
CRMay 3, 2023
Data Privacy with Homomorphic Encryption in Neural Networks Training and InferenceIvone Amorim, Eva Maia, Pedro Barbosa et al.
The use of Neural Networks (NNs) for sensitive data processing is becoming increasingly popular, raising concerns about data privacy and security. Homomorphic Encryption (HE) has the potential to be used as a solution to preserve data privacy in NN. This study provides a comprehensive analysis on the use of HE for NN training and classification, focusing on the techniques and strategies used to enhance data privacy and security. The current state-of-the-art in HE for NNs is analysed, and the challenges and limitations that need to be addressed to make it a reliable and efficient approach for privacy preservation are identified. Also, the different categories of HE schemes and their suitability for NNs are discussed, as well as the techniques used to optimize the accuracy and efficiency of encrypted models. The review reveals that HE has the potential to provide strong data privacy guarantees for NNs, but several challenges need to be addressed, such as limited support for advanced NN operations, scalability issues, and performance trade-offs.
CROct 31, 2017
DynSGX: A Privacy Preserving Toolset for Dynamically Loading Functions into Intel(R) SGX EnclavesRodolfo Silva, Pedro Barbosa, Andrey Brito
Intel(R) Software Guard eXtensions (SGX) is a hardware-based technology for ensuring security of sensitive data from disclosure or modification that enables user-level applications to allocate protected areas of memory called enclaves. Such memory areas are cryptographically protected even from code running with higher privilege levels. This memory protection can be used to develop secure and dependable applications, but the technology has some limitations: ($i$) the code of an enclave is visible at load time, ($ii$) libraries used by the code must be statically linked, and ($iii$) the protected memory size is limited, demanding page swapping to be done when this limit is exceeded. We present DynSGX, a privacy preserving tool that enables users and developers to dynamically load and unload code to be executed inside SGX enclaves. Such a technology makes possible that developers use public cloud infrastructures to run applications based on sensitive code and data. Moreover, we present a series of experiments that assess how applications dynamically loaded by DynSGX perform in comparison to statically linked applications that disregard privacy of the enclave code at load time.
HCAug 19, 2017
Designing for Pragmatists and Fundamentalists: Privacy Concerns and Attitudes on the Internet of ThingsLesandro Ponciano, Pedro Barbosa, Francisco Brasileiro et al.
Internet of Things (IoT) systems have aroused enthusiasm and concerns. Enthusiasm comes from their utilities in people daily life, and concerns may be associated with privacy issues. By using two IoT systems as case-studies, we examine users' privacy beliefs, concerns and attitudes. We focus on four major dimensions: the collection of personal data, the inference of new information, the exchange of information to third parties, and the risk-utility trade-off posed by the features of the system. Altogether, 113 Brazilian individuals answered a survey about such dimensions. Although their perceptions seem to be dependent on the context, there are recurrent patterns. Our results suggest that IoT users can be classified into unconcerned, fundamentalists and pragmatists. Most of them exhibit a pragmatist profile and believe in privacy as a right guaranteed by law. One of the most privacy concerning aspect is the exchange of personal information to third parties. Individuals' perceived risk is negatively correlated with their perceived utility in the features of the system. We discuss practical implications of these results and suggest heuristics to cope with privacy concerns when designing IoT systems.
CRAug 19, 2017
NIZKCTF: A Non-Interactive Zero-Knowledge Capture the Flag PlatformPaulo Matias, Pedro Barbosa, Thiago Cardoso et al.
Capture the Flag (CTF) competitions are increasingly important for the Brazilian cybersecurity community as education and professional tools. Unfortunately, CTF platforms may suffer from security issues, giving an unfair advantage to competitors. To mitigate this, we propose NIZKCTF, the first open-audit CTF platform based on non-interactive zero-knowledge proofs.