LGNov 18, 2022
Understanding the double descent curve in Machine LearningLuis Sa-Couto, Jose Miguel Ramos, Miguel Almeida et al.
The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This led to the proposal of the double descent curve of performance by Belkin et al. Although it seems to describe a real, representative phenomenon, the field is lacking a fundamental theoretical understanding of what is happening, what are the consequences for model selection and when is double descent expected to occur. In this paper we develop a principled understanding of the phenomenon, and sketch answers to these important questions. Furthermore, we report real experimental results that are correctly predicted by our proposed hypothesis.
DCAug 23, 2019Code
Towards Secure and Decentralized Sharing of IoT DataHien Thi Thu Truong, Miguel Almeida, Ghassan Karame et al.
The Internet of Things (IoT) bears unprecedented security and scalability challenges due to the magnitude of data produced and exchanged by IoT devices and platforms. Some of those challenges are currently being addressed by coupling IoT applications with blockchains. However, current blockchain-backed IoT systems simply use the blockchain to store access control policies, thereby underutilizing the power of blockchain technology. In this paper, we propose a new framework named Sash that couples IoT platforms with blockchain that provides a number of advantages compared to state of the art. In Sash, the blockchain is used to store access control policies and take access control decisions. Therefore, both changes to policies and access requests are correctly enforced and publicly auditable. Further, we devise a ``data marketplace'' by leveraging the ability of blockchains to handle financial transaction and providing ``by design'' remuneration to data producers. Finally, we exploit a special flavor of identity-based encryption to cater for cryptography-enforced access control while minimizing the overhead to distribute decryption keys. We prototype Sash by using the FIWARE open source IoT platform and the Hyperledger Fabric framework as the blockchain back-end. We also evaluate the performance of our prototype and show that it incurs tolerable overhead in realistic deployment settings.
CRSep 24, 2020
BreachRadar: Automatic Detection of Points-of-CompromiseMiguel Araujo, Miguel Almeida, Jaime Ferreira et al.
Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure. BreachRadar is a distributed alternating algorithm that assigns a probability of being compromised to the different possible locations. We implement this method using Apache Spark and show its linear scalability in the number of machines and transactions. BreachRadar is applied to two datasets with billions of real transaction records and fraud labels where we provide multiple examples of real Points-of-Compromise we are able to detect. We further show the effectiveness of our method when injecting Points-of-Compromise in one of these datasets, simultaneously achieving over 90% precision and recall when only 10% of the cards have been victims of fraud.