DC DB SESep 28, 2012

Testing MapReduce-Based Systems

João Eugenio Marynowski, Michel Albonico, Eduardo Cunha de Almeida, Gerson Sunyé

arXiv:1209.6580v213 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of testing distributed data processing systems for developers, though it is incremental as it builds on existing MapReduce frameworks.

The paper tackles the difficulty of testing MapReduce-based systems on large clusters by introducing HadoopTest, a novel testing solution that uses distributed testers to inject failures and monitor workers, demonstrating promising results in finding bugs for Hadoop applications.

MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware problems, and outages. Testing MR-based systems is hard, since it is needed a great effort of test harness to execute distributed test cases upon failures. In this paper, we present a novel testing solution to tackle this issue called HadoopTest. This solution is based on a scalable harness approach, where distributed tester components are hung around each map and reduce worker (i.e., node). Testers are allowed to stimulate each worker to inject failures on them, monitor their behavior, and validate testing results. HadoopTest was used to test two applications bundled into Hadoop, the Apache open source MapReduce implementation. Our initial implementation demonstrates promising results, with HadoopTest coordinating test cases across distributed MapReduce workers, and finding bugs.

View on arXiv PDF

Similar