Relational Equivalence Proofs Between Imperative and MapReduce Algorithms
This addresses the critical issue of preventing costly errors in distributed computing for developers and researchers, though it is incremental as it builds on existing verification techniques.
The paper tackles the problem of verifying correctness when translating imperative algorithms to MapReduce frameworks by presenting a novel approach that partitions equivalence proofs into smaller steps, using uniform and context-dependent transformations. The result demonstrates feasibility by successfully proving equivalence for k-means and PageRank algorithms using the Coq theorem prover with partial automation.
MapReduce frameworks are widely used for the implementation of distributed algorithms. However, translating imperative algorithms into these frameworks requires significant structural changes to the algorithm. As the costs of running faulty algorithms at scale can be severe, it is highly desirable to verify the correctness of the translation, i.e., to prove that the MapReduce version is equivalent to the imperative original. We present a novel approach for proving equivalence between imperative and MapReduce algorithms based on partitioning the equivalence proof into a sequence of equivalence proofs between intermediate programs with smaller differences. Our approach is based on the insight that two kinds of sub-proofs are required: (1) uniform transformations changing the controlflow structure that are mostly independent of the particular context in which they are applied; and (2) context-dependent transformations that are not uniform but that preserve the overall structure and can be proved correct using coupling invariants. We demonstrate the feasibility of our approach by evaluating it on two prototypical algorithms commonly used as examples in MapReduce frameworks: k-means and PageRank. To carry out the proofs, we use the interactive theorem prover Coq with partial proof automation. The results show that our approach and its prototypical implementation based on Coq enables equivalence proofs of non-trivial algorithms and could be automated to a large degree.