Pedro Casas

h-index31

8papers

201citations

Novelty37%

AI Score40

Ranked #75,279 of 194,257 authors (top 39%)#4,651 in AI (top 37%)

8 Papers

3.3LGSep 23, 2022

Smart Active Sampling to enhance Quality Assurance Efficiency

Clemens Heistracher, Stefan Stricker, Pedro Casas et al.

We propose a new sampling strategy, called smart active sapling, for quality inspections outside the production line. Based on the principles of active learning a machine learning model decides which samples are sent to quality inspection. On the one hand, this minimizes the production of scrap parts due to earlier detection of quality violations. On the other hand, quality inspection costs are reduced for smooth operation.

3.0CRMar 2

Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection

Tailai Song, Pedro Casas, Michela Meo

Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phishing detectors achieve strong accuracy, their reliance on external knowledge bases, cloud services, and complex multimodal pipelines fundamentally limits practicality, scalability, and reproducibility. In contrast, conventional deep learning approaches often fail to generalize to evolving phishing campaigns. We introduce SpecularNet, a novel lightweight framework for reference-free web phishing detection that demonstrates how carefully designed compact architectures can rival heavyweight systems. SpecularNet operates solely on the domain name and HTML structure, modeling the Document Object Model (DOM) as a tree and leveraging a hierarchical graph autoencoding architecture with directional, level-wise message passing. This design captures higher-order structural invariants of phishing webpages while enabling fast, end-to-end inference on standard CPUs. Extensive evaluation against 13 state of the art phishing detectors, including leading reference-based systems, shows that SpecularNet achieves competitive detection performance with dramatically lower computational cost. On benchmark datasets, it reaches an F1 score of 93.9%, trailing the best reference-based method slightly while reducing inference time from several seconds to approximately 20 milliseconds per webpage. Field and robustness evaluations further validate SpecularNet in real-world deployments, on a newly collected 2026 open-world dataset, and against adversarial attacks.

7.1LGJul 2, 2025

Towards Foundation Auto-Encoders for Time-Series Anomaly Detection

Gastón García González, Pedro Casas, Emilio Martínez et al.

We investigate a novel approach to time-series modeling, inspired by the successes of large pretrained foundation models. We introduce FAE (Foundation Auto-Encoders), a foundation generative-AI model for anomaly detection in time-series data, based on Variational Auto-Encoders (VAEs). By foundation, we mean a model pretrained on massive amounts of time-series data which can learn complex temporal patterns useful for accurate modeling, forecasting, and detection of anomalies on previously unseen datasets. FAE leverages VAEs and Dilated Convolutional Neural Networks (DCNNs) to build a generic model for univariate time-series modeling, which could eventually perform properly in out-of-the-box, zero-shot anomaly detection applications. We introduce the main concepts of FAE, and present preliminary results in different multi-dimensional time-series datasets from various domains, including a real dataset from an operational mobile ISP, and the well known KDD 2021 Anomaly Detection dataset.

5.7AIOct 16, 2020

On the Usage of Generative Models for Network Anomaly Detection in Multivariate Time-Series

Gastón García González, Pedro Casas, Alicia Fernández et al.

Despite the many attempts and approaches for anomaly detection explored over the years, the automatic detection of rare events in data communication networks remains a complex problem. In this paper we introduce Net-GAN, a novel approach to network anomaly detection in time-series, using recurrent neural networks (RNNs) and generative adversarial networks (GAN). Different from the state of the art, which traditionally focuses on univariate measurements, Net-GAN detects anomalies in multivariate time-series, exploiting temporal dependencies through RNNs. Net-GAN discovers the underlying distribution of the baseline, multivariate data, without making any assumptions on its nature, offering a powerful approach to detect anomalies in complex, difficult to model network monitoring data. We further exploit the concepts behind generative models to conceive Net-VAE, a complementary approach to Net-GAN for network anomaly detection, based on variational auto-encoders (VAE). We evaluate Net-GAN and Net-VAE in different monitoring scenarios, including anomaly detection in IoT sensor data, and intrusion detection in network measurements. Generative models represent a promising approach for network anomaly detection, especially when considering the complexity and ever-growing number of time-series to monitor in operational networks.

10.5AIMar 3, 2020

EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis

Andrea Morichetta, Pedro Casas, Marco Mellia

The application of unsupervised learning approaches, and in particular of clustering techniques, represents a powerful exploration means for the analysis of network measurements. Discovering underlying data characteristics, grouping similar measurements together, and identifying eventual patterns of interest are some of the applications which can be tackled through clustering. Being unsupervised, clustering does not always provide precise and clear insight into the produced output, especially when the input data structure and distribution are complex and difficult to grasp. In this paper we introduce EXPLAIN-IT, a methodology which deals with unlabeled data, creates meaningful clusters, and suggests an explanation to the clustering results for the end-user. EXPLAIN-IT relies on a novel explainable Artificial Intelligence (AI) approach, which allows to understand the reasons leading to a particular decision of a supervised learning-based model, additionally extending its application to the unsupervised learning domain. We apply EXPLAIN-IT to the problem of YouTube video quality classification under encrypted traffic scenarios, showing promising results.

14.6CRMar 3, 2020

DeepMAL -- Deep Learning Models for Malware Traffic Detection and Classification

Gonzalo Marín, Pedro Casas, Germán Capdehourat

Robust network security systems are essential to prevent and mitigate the harming effects of the ever-growing occurrence of network attacks. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, which rely on the careful engineering of expert, handcrafted input features. The main limitation of this approach is that handcrafted features can fail to perform well under different scenarios and types of attacks. Deep Learning (DL) models can solve this limitation using their ability to learn feature representations from raw, non-processed data. In this paper we explore the power of DL models on the specific problem of detection and classification of malware network traffic. As a major advantage with respect to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. We introduce DeepMAL, a DL model which is able to capture the underlying statistics of malicious traffic, without any sort of expert handcrafted features. Using publicly available traffic traces containing different families of malware traffic, we show that DeepMAL can detect and classify malware flows with high accuracy, outperforming traditional, shallow-like models.

2.3AIMar 3, 2020

Two Decades of AI4NETS-AI/ML for Data Networks: Challenges & Research Directions

Pedro Casas

The popularity of Artificial Intelligence (AI) -- and of Machine Learning (ML) as an approach to AI, has dramatically increased in the last few years, due to its outstanding performance in various domains, notably in image, audio, and natural language processing. In these domains, AI success-stories are boosting the applied field. When it comes to AI/ML for data communication Networks (AI4NETS), and despite the many attempts to turn networks into learning agents, the successful application of AI/ML in networking is limited. There is a strong resistance against AI/ML-based solutions, and a striking gap between the extensive academic research and the actual deployments of such AI/ML-based systems in operational environments. The truth is, there are still many unsolved complex challenges associated to the analysis of networking data through AI/ML, which hinders its acceptability and adoption in the practice. In this positioning paper I elaborate on the most important show-stoppers in AI4NETS, and present a research agenda to tackle some of these challenges, enabling a natural adoption of AI/ML for networking. In particular, I focus the future research in AI4NETS around three major pillars: (i) to make AI/ML immediately applicable in networking problems through the concepts of effective learning, turning it into a useful and reliable way to deal with complex data-driven networking problems; (ii) to boost the adoption of AI/ML at the large scale by learning from the Internet-paradigm itself, conceiving novel distributed and hierarchical learning approaches mimicking the distributed topological principles and operation of the Internet itself; and (iii) to exploit the softwarization and distribution of networks to conceive AI/ML-defined Networks (AIDN), relying on the distributed generation and re-usage of knowledge through novel Knowledge Delivery Networks (KDNs).

4.3NIJan 24, 2020

All that Glitters is not Bitcoin -- Unveiling the Centralized Nature of the BTC (IP) Network

Sami Ben Mariem, Pedro Casas, Matteo Romiti et al.

Blockchains are typically managed by peer-to-peer (P2P) networks providing the support and substrate to the so-called distributed ledger (DLT), a replicated, shared, and synchronized data structure, geographically spread across multiple nodes. The Bitcoin (BTC) blockchain is by far the most well known DLT, used to record transactions among peers, based on the BTC digital currency. In this paper, we focus on the network side of the BTC P2P network, analyzing its nodes from a purely network measurements-based approach. We present a BTC crawler able to discover and track the BTC P2P network through active measurements, and use it to analyze its main properties. Through the combined analysis of multiple snapshots of the BTC network as well as by using other publicly available data sources on the BTC network and DLT, we unveil the BTC P2P network, locate its active nodes, study their performance, and track the evolution of the network over the past two years. Among other relevant findings, we show that (i) the size of the BTC network has remained almost constant during the last 12 months - since the major BTC price drop in early 2018, (ii) most of the BTC P2P network resides in US and EU countries, and (iii) despite this western network locality, most of the mining activity and corresponding revenue is controlled by major mining pools located in China. By additionally analyzing the distribution of BTC coins among independent BTC entities (i.e., single BTC addresses or groups of BTC addresses controlled by the same actor), we also conclude that (iv) BTC is very far from being the decentralized and uncontrolled system it is so much advertised to be, with only 4.5% of all the BTC entities holding about 85% of all circulating BTC coins.