DCDec 22, 2018
Bioinformatics Computational Cluster Batch Task Profiling with Machine Learning for Failure PredictionChristopher Harrison, Christine R. Kirkpatrick, Inês Dutra
Motivation: Traditional computational cluster schedulers are based on user inputs and run time needs request for memory and CPU, not IO. Heavily IO bound task run times, like ones seen in many big data and bioinformatics problems, are dependent on the IO subsystems scheduling and are problematic for cluster resource scheduling. The problematic rescheduling of IO intensive and errant tasks is a lost resource. Understanding the conditions in both successful and failed tasks and differentiating them could provide knowledge to enhancing cluster scheduling and intelligent resource optimization. Results: We analyze a production computational cluster contributing 6.7 thousand CPU hours to research over two years. Through this analysis we develop a machine learning task profiling agent for clusters that attempts to predict failures between identically provision requested tasks.
CVJul 17, 2024
Instance-wise Uncertainty for Class Imbalance in Semantic SegmentationLuís Almeida, Inês Dutra, Francesco Renna
Semantic segmentation is a fundamental computer vision task with a vast number of applications. State of the art methods increasingly rely on deep learning models, known to incorrectly estimate uncertainty and being overconfident in predictions, especially in data not seen during training. This is particularly problematic in semantic segmentation due to inherent class imbalance. Popular uncertainty quantification approaches are task-agnostic and fail to leverage spatial pixel correlations in uncertainty estimates, crucial in this task. In this work, a novel training methodology specifically designed for semantic segmentation is presented. Training samples are weighted by instance-wise uncertainty masks computed by an ensemble. This is shown to increase performance on minority classes, boost model generalization and robustness to domain-shift when compared to using the inverse of class proportions or no class weights at all. This method addresses the challenges of class imbalance and uncertainty estimation in semantic segmentation, potentially enhancing model performance and reliability across various applications.
LGMay 10, 2024
Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning CorpusFilipe Marinho Rocha, Inês Dutra, Vítor Santos Costa
The Abstraction and Reasoning Corpus (ARC) is a general artificial intelligence benchmark that is currently unsolvable by any Machine Learning method, including Large Language Models (LLMs). It demands strong generalization and reasoning capabilities which are known to be weaknesses of Neural Network based systems. In this work, we propose a Program Synthesis system that uses Inductive Logic Programming (ILP), a branch of Symbolic AI, to solve ARC. We have manually defined a simple Domain Specific Language (DSL) that corresponds to a small set of object-centric abstractions relevant to ARC. This is the Background Knowledge used by ILP to create Logic Programs that provide reasoning capabilities to our system. The full system is capable of generalize to unseen tasks, since ILP can create Logic Program(s) from few examples, in the case of ARC: pairs of Input-Output grids examples for each task. These Logic Programs are able to generate Objects present in the Output grid and the combination of these can form a complete program that transforms an Input grid into an Output grid. We randomly chose some tasks from ARC that dont require more than the small number of the Object primitives we implemented and show that given only these, our system can solve tasks that require each, such different reasoning.
CRAug 18, 2025
A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 with New Scoring and Data SourcesTadeu Freitas, Carlos Novo, Inês Dutra et al.
Intrusion Tolerant Systems (ITSs) have become increasingly critical due to the rise of multi-domain adversaries exploiting diverse attack surfaces. ITS architectures aim to tolerate intrusions, ensuring system compromise is prevented or mitigated even with adversary presence. Existing ITS solutions often employ Risk Managers leveraging public security intelligence to adjust system defenses dynamically against emerging threats. However, these approaches rely heavily on databases like NVD and ExploitDB, which require manual analysis for newly discovered vulnerabilities. This dependency limits the system's responsiveness to rapidly evolving threats. HAL 9000, an ITS Risk Manager introduced in our prior work, addressed these challenges through machine learning. By analyzing descriptions of known vulnerabilities, HAL 9000 predicts and assesses new vulnerabilities automatically. To calculate the risk of a system, it also incorporates the Exploitability Probability Scoring system to estimate the likelihood of exploitation within 30 days, enhancing proactive defense capabilities. Despite its success, HAL 9000's reliance on NVD and ExploitDB knowledge is a limitation, considering the availability of other sources of information. This extended work introduces a custom-built scraper that continuously mines diverse threat sources, including security advisories, research forums, and real-time exploit proofs-of-concept. This significantly expands HAL 9000's intelligence base, enabling earlier detection and assessment of unverified vulnerabilities. Our evaluation demonstrates that integrating scraper-derived intelligence with HAL 9000's risk management framework substantially improves its ability to address emerging threats. This paper details the scraper's integration into the architecture, its role in providing additional information on new threats, and the effects on HAL 9000's management.
ROMay 30, 2023
Data and Knowledge for Overtaking Scenarios in Autonomous DrivingMariana Pinto, Inês Dutra, Joaquim Fonseca
Autonomous driving has become one of the most popular research topics within Artificial Intelligence. An autonomous vehicle is understood as a system that combines perception, decision-making, planning, and control. All of those tasks require that the vehicle collects surrounding data in order to make a good decision and action. In particular, the overtaking maneuver is one of the most critical actions of driving. The process involves lane changes, acceleration and deceleration actions, and estimation of the speed and distance of the vehicle in front or in the lane in which it is moving. Despite the amount of work available in the literature, just a few handle overtaking maneuvers and, because overtaking can be risky, no real-world dataset is available. This work contributes in this area by presenting a new synthetic dataset whose focus is the overtaking maneuver. We start by performing a thorough review of the state of the art in autonomous driving and then explore the main datasets found in the literature (public and private, synthetic and real), highlighting their limitations, and suggesting a new set of features whose focus is the overtaking maneuver.
IRNov 8, 2015
Accelerating Recommender Systems using GPUsAndré Valente Rodrigues, Alípio Jorge, Inês Dutra
We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).
AIJun 2, 2015
SkILL - a Stochastic Inductive Logic LearnerJoana Côrte-Real, Theofrastos Mantadelis, Inês Dutra et al.
Probabilistic Inductive Logic Programming (PILP) is a rel- atively unexplored area of Statistical Relational Learning which extends classic Inductive Logic Programming (ILP). This work introduces SkILL, a Stochastic Inductive Logic Learner, which takes probabilistic annotated data and produces First Order Logic theories. Data in several domains such as medicine and bioinformatics have an inherent degree of uncer- tainty, that can be used to produce models closer to reality. SkILL can not only use this type of probabilistic data to extract non-trivial knowl- edge from databases, but it also addresses efficiency issues by introducing a novel, efficient and effective search strategy to guide the search in PILP environments. The capabilities of SkILL are demonstrated in three dif- ferent datasets: (i) a synthetic toy example used to validate the system, (ii) a probabilistic adaptation of a well-known biological metabolism ap- plication, and (iii) a real world medical dataset in the breast cancer domain. Results show that SkILL can perform as well as a deterministic ILP learner, while also being able to incorporate probabilistic knowledge that would otherwise not be considered.
AIJun 10, 2014
ExpertBayes: Automatically refining manually built Bayesian networksEzilda Almeida, Pedro Ferreira, Tiago Vinhoza et al.
Bayesian network structures are usually built using only the data and starting from an empty network or from a naive Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesian networks built by specialists and show that minor perturbations to this original network can yield better classifiers with a very small computational cost, while maintaining most of the intended meaning of the original model.