DCSYSYDec 22, 2018

Bioinformatics Computational Cluster Batch Task Profiling with Machine Learning for Failure Prediction

arXiv:1812.095371 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

For bioinformatics researchers and cluster administrators, this work addresses the problem of IO-bound task failures in computational clusters, but the results are incremental as they focus on a specific cluster and do not report quantitative prediction performance.

The paper analyzes a production computational cluster contributing 6.7 thousand CPU hours over two years and develops a machine learning task profiling agent to predict failures between identically provisioned tasks, aiming to enhance cluster scheduling and resource optimization.

Motivation: Traditional computational cluster schedulers are based on user inputs and run time needs request for memory and CPU, not IO. Heavily IO bound task run times, like ones seen in many big data and bioinformatics problems, are dependent on the IO subsystems scheduling and are problematic for cluster resource scheduling. The problematic rescheduling of IO intensive and errant tasks is a lost resource. Understanding the conditions in both successful and failed tasks and differentiating them could provide knowledge to enhancing cluster scheduling and intelligent resource optimization. Results: We analyze a production computational cluster contributing 6.7 thousand CPU hours to research over two years. Through this analysis we develop a machine learning task profiling agent for clusters that attempts to predict failures between identically provision requested tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes