CRFeb 5, 2021
Frontrunner Jones and the Raiders of the Dark Forest: An Empirical Study of Frontrunning on the Ethereum BlockchainChristof Ferreira Torres, Ramiro Camino, Radu State
Ethereum prospered the inception of a plethora of smart contract applications, ranging from gambling games to decentralized finance. However, Ethereum is also considered a highly adversarial environment, where vulnerable smart contracts will eventually be exploited. Recently, Ethereum's pool of pending transaction has become a far more aggressive environment. In the hope of making some profit, attackers continuously monitor the transaction pool and try to frontrun their victims' transactions by either displacing or suppressing them, or strategically inserting their transactions. This paper aims to shed some light into what is known as a dark forest and uncover these predators' actions. We present a methodology to efficiently measure the three types of frontrunning: displacement, insertion, and suppression. We perform a large-scale analysis on more than 11M blocks and identify almost 200K attacks with an accumulated profit of 18.41M USD for the attackers, providing evidence that frontrunning is both, lucrative and a prevalent issue.
LGMay 7, 2020
Minority Class Oversampling for Tabular Data with Deep Generative ModelsRamiro Camino, Christian Hammerschmidt, Radu State
In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A common method to treat imbalanced datasets is under- and oversampling. In this process, samples are either removed from the majority class or synthetic samples are added to the minority class. In this paper, we follow up on recent developments in deep learning. We take proposals of deep generative models, including our own, and study the ability of these approaches to provide realistic samples that improve performance on imbalanced classification tasks via oversampling. Across 160K+ experiments, we show that all of the new methods tend to perform better than simple baseline methods such as SMOTE, but require different under- and oversampling ratios to do so. Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely. We also observe that the improvements in terms of performance metric, while shown to be significant when ranking the methods, often are minor in absolute terms, especially compared to the required effort. Furthermore, we notice that a large part of the improvement is due to undersampling, not oversampling. We make our code and testing framework available.
CROct 3, 2019
A Data Science Approach for Honeypot Detection in EthereumRamiro Camino, Christof Ferreira Torres, Mathis Baden et al.
Ethereum smart contracts have recently drawn a considerable amount of attention from the media, the financial industry and academia. With the increase in popularity, malicious users found new opportunities to profit by deceiving newcomers. Consequently, attackers started luring other attackers into contracts that seem to have exploitable flaws, but that actually contain a complex hidden trap that in the end benefits the contract creator. In the blockchain community, these contracts are known as honeypots. A recent study presented a tool called HONEYBADGER that uses symbolic execution to detect honeypots by analyzing contract bytecode. In this paper, we present a data science detection approach based foremost on the contract transaction behavior. We create a partition of all the possible cases of fund movements between the contract creator, the contract, the transaction sender and other participants. To this end, we add transaction aggregated features, such as the number of transactions and the corresponding mean value and other contract features, for example compilation information and source code length. We find that all aforementioned categories of features contain useful information for the detection of honeypots. Moreover, our approach allows us to detect new, previously undetected honeypots of already known techniques. We furthermore employ our method to test the detection of unknown honeypot techniques by sequentially removing one technique from the training set. We show that our method is capable of discovering the removed honeypot techniques. Finally, we discovered two new techniques that were previously not known.
MLJul 3, 2018
Generating Multi-Categorical Samples with Generative Adversarial NetworksRamiro Camino, Christian Hammerschmidt, Radu State
We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models.