Exhaustive Exploration of the Failure-oblivious Computing Search Space
This work addresses the need for deeper insights into failure-oblivious computing to improve software reliability, but it is incremental as it builds on existing techniques without introducing a new method.
The paper tackled the problem of understanding failure-oblivious computing for software high-availability by exhaustively analyzing the search space of 16 field failures in Java software, resulting in a better understanding of failure-oblivious behaviors and opening new research directions.
High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens new promising research directions.