OSNov 3, 2022Code
MUSTACHE: Multi-Step-Ahead Predictions for Cache EvictionGabriele Tolomei, Lorenzo Takanen, Fabio Pinelli
In this work, we propose MUSTACHE, a new page cache replacement algorithm whose logic is learned from observed memory access requests rather than fixed like existing policies. We formulate the page request prediction problem as a categorical time series forecasting task. Then, our method queries the learned page request forecaster to obtain the next $k$ predicted page memory references to better approximate the optimal Bélády's replacement algorithm. We implement several forecasting techniques using advanced deep learning architectures and integrate the best-performing one into an existing open-source cache simulator. Experiments run on benchmark datasets show that MUSTACHE outperforms the best page replacement heuristic (i.e., exact LRU), improving the cache hit ratio by 1.9% and reducing the number of reads/writes required to handle cache misses by 18.4% and 10.3%.
GTMay 4
A Unified Framework and Comparative Study of Decentralized Finance Derivatives ProtocolsLuca Pennella, Pietro Saggese, Fabio Pinelli et al.
Decentralized Finance (DeFi) applications introduce novel financial instruments replicating and extending traditional ones through blockchain-based smart contracts. Among these applications, DeFi derivatives protocols enable the creation and trading of decentralized derivative instruments whose value depends on underlying cryptoassets, indices, or other reference variables. Despite their growing significance, however, they remain relatively understudied compared to other DeFi protocols, such as lending protocols and decentralized exchanges. This paper systematically analyzes DeFi derivatives protocols, categorized into perpetuals, options, and synthetics, with the aim of comparing their instrument structures, protocol mechanisms, operational dynamics, and economic agents. We provide a formal characterization of the main classes of decentralized derivative instruments and develop a protocol-agnostic framework that connects instrument-level specifications, market-state variables, and protocol-level mechanisms. We complement the analytical framework with numerical simulations that evaluate how derivative positions evolve under varying economic conditions, including changes in underlying asset prices, volatility, protocol-specific fees, and leverage. Overall, this study provides a structured analytical framework for understanding and comparing the design and functioning of decentralized finance derivatives protocols.
CRJan 12, 2023
Explainable Ponzi Schemes Detection on EthereumLetterio Galletta, Fabio Pinelli
Blockchain technology has been successfully exploited for deploying new economic applications. However, it has started arousing the interest of malicious actors who deliver scams to deceive honest users and to gain economic advantages. Ponzi schemes are one of the most common scams. Here, we present a classifier for detecting smart Ponzi contracts on Ethereum, which can be used as the backbone for developing detection tools. First, we release a labelled data set with 4422 unique real-world smart contracts to address the problem of the unavailability of labelled data. Then, we show that our classifier outperforms the ones proposed in the literature when considering the AUC as a metric. Finally, we identify a small and effective set of features that ensures a good classification quality and investigate their impacts on the classification using eXplainable AI techniques.
CLSep 26, 2025Code
Human Mobility Datasets Enriched With Contextual and Social DimensionsChiara Pugliese, Francesco Lettich, Guido Rocchietti et al.
In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.
LGMar 6, 2025
Explainable AI in Time-Sensitive Scenarios: Prefetched Offline Explanation ModelFabio Michele Russo, Carlo Metta, Anna Monreale et al.
As predictive machine learning models become increasingly adopted and advanced, their role has evolved from merely predicting outcomes to actively shaping them. This evolution has underscored the importance of Trustworthy AI, highlighting the necessity to extend our focus beyond mere accuracy and toward a comprehensive understanding of these models' behaviors within the specific contexts of their applications. To further progress in explainability, we introduce Poem, Prefetched Offline Explanation Model, a model-agnostic, local explainability algorithm for image data. The algorithm generates exemplars, counterexemplars and saliency maps to provide quick and effective explanations suitable for time-sensitive scenarios. Leveraging an existing local algorithm, \poem{} infers factual and counterfactual rules from data to create illustrative examples and opposite scenarios with an enhanced stability by design. A novel mechanism then matches incoming test points with an explanation base and produces diverse exemplars, informative saliency maps and believable counterexemplars. Experimental results indicate that Poem outperforms its predecessor Abele in speed and ability to generate more nuanced and varied exemplars alongside more insightful saliency maps and valuable counterexemplars.
LGNov 20, 2024
Urban Region Embeddings from Service-Specific Mobile Traffic DataGiulio Loddi, Chiara Pugliese, Francesco Lettich et al.
With the advent of advanced 4G/5G mobile networks, mobile phone data collected by operators now includes detailed, service-specific traffic information with high spatio-temporal resolution. In this paper, we leverage this type of data to explore its potential for generating high-quality representations of urban regions. To achieve this, we present a methodology for creating urban region embeddings from service-specific mobile traffic data, employing a temporal convolutional network-based autoencoder, transformers, and learnable weighted sum models to capture key urban features. In the extensive experimental evaluation conducted using a real-world dataset, we demonstrate that the embeddings generated by our methodology effectively capture urban characteristics. Specifically, our embeddings are compared against those of a state-of-the-art competitor across two downstream tasks. Additionally, through clustering techniques, we investigate how well the embeddings produced by our methodology capture the temporal dynamics and characteristics of the underlying urban regions. Overall, this work highlights the potential of service-specific mobile traffic data for urban research and emphasizes the importance of making such data accessible to support public innovation.
CRApr 21, 2021
Turning Federated Learning Systems Into Covert ChannelsGabriele Costa, Fabio Pinelli, Simone Soderi et al.
Federated learning (FL) goes beyond traditional, centralized machine learning by distributing model training among a large collection of edge clients. These clients cooperatively train a global, e.g., cloud-hosted, model without disclosing their local, private training data. The global model is then shared among all the participants which use it for local predictions. In this paper, we put forward a novel attacker model aiming at turning FL systems into covert channels to implement a stealth communication infrastructure. The main intuition is that, during federated training, a malicious sender can poison the global model by submitting purposely crafted examples. Although the effect of the model poisoning is negligible to other participants, and does not alter the overall model performance, it can be observed by a malicious receiver and used to transmit a single bit.
CYAug 12, 2015
Towards Real-time Customer Experience Prediction for Telecommunication OperatorsErnesto Diaz-Aviles, Fabio Pinelli, Karol Lynch et al.
Telecommunications operators (telcos) traditional sources of income, voice and SMS, are shrinking due to customers using over-the-top (OTT) applications such as WhatsApp or Viber. In this challenging environment it is critical for telcos to maintain or grow their market share, by providing users with as good an experience as possible on their network. But the task of extracting customer insights from the vast amounts of data collected by telcos is growing in complexity and scale everey day. How can we measure and predict the quality of a user's experience on a telco network in real-time? That is the problem that we address in this paper. We present an approach to capture, in (near) real-time, the mobile customer experience in order to assess which conditions lead the user to place a call to a telco's customer care center. To this end, we follow a supervised learning approach for prediction and train our 'Restricted Random Forest' model using, as a proxy for bad experience, the observed customer transactions in the telco data feed before the user places a call to a customer care center. We evaluate our approach using a rich dataset provided by a major African telecommunication's company and a novel big data architecture for both the training and scoring of predictive models. Our empirical study shows our solution to be effective at predicting user experience by inferring if a customer will place a call based on his current context. These promising results open new possibilities for improved customer service, which will help telcos to reduce churn rates and improve customer experience, both factors that directly impact their revenue growth.
IRDec 26, 2014
Predicting User Engagement in Twitter with Collaborative RankingErnesto Diaz-Aviles, Hoang Thanh Lam, Fabio Pinelli et al.
Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the quality of a predicted rating or the ranking performance for top-n recommended items. However, restricting the recommender system evaluation to these two aspects is rather limiting and neglects other dimensions that could better characterize a well-perceived recommendation. In this paper, instead of optimizing rating or top-n recommendation, we focus on the task of predicting which items generate the highest user engagement. In particular, we use Twitter as our testbed and cast the problem as a Collaborative Ranking task where the rich features extracted from the metadata of the tweets help to complement the transaction information limited to user ids, item ids, ratings and timestamps. We learn a scoring function that directly optimizes the user engagement in terms of nDCG@10 on the predicted ranking. Experiments conducted on an extended version of the MovieTweetings dataset, released as part of the RecSys Challenge 2014, show the effectiveness of our approach.