DCAug 24, 2022
AI-coupled HPC WorkflowsShantenu Jha, Vincent R. Pascuzzi, Matteo Turilli
Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate modeling, typically reducing computational needs compared to traditional methods. This chapter discusses various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC workflows. The increasing need of coupling AI/ML and HPC across scientific domains is motivated, and then exemplified by a number of production-grade use cases for each mode. We additionally discuss the primary challenges of extreme-scale AI-coupled HPC campaigns -- task heterogeneity, adaptivity, performance -- and several framework and middleware solutions which aim to address them. While both HPC workflow and AI/ML computing paradigms are independently effective, we highlight how their integration, and ultimate convergence, is leading to significant improvements in scientific performance across a range of domains, ultimately resulting in scientific explorations otherwise unattainable.
DCAug 23, 2022
Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC WorkflowsVincent R. Pascuzzi, Ozgur O. Kilic, Matteo Turilli et al.
Heterogeneous scientific workflows consist of numerous types of tasks that require executing on heterogeneous resources. Asynchronous execution of those tasks is crucial to improve resource utilization, task throughput and reduce workflows' makespan. Therefore, middleware capable of scheduling and executing different task types across heterogeneous resources must enable asynchronous execution of tasks. In this paper, we investigate the requirements and properties of the asynchronous task execution of machine learning (ML)-driven high performance computing (HPC) workflows. We model the degree of asynchronicity permitted for arbitrary workflows and propose key metrics that can be used to determine qualitative benefits when employing asynchronous execution. Our experiments represent relevant scientific drivers, we perform them at scale on Summit, and we show that the performance enhancements due to asynchronous execution are consistent with our model.
41.2QUANT-PHApr 22
Distributed Quantum Optimization for Large-Scale Higher-Order Problems with Dense InteractionsSeongmin Kim, Vincent R. Pascuzzi, Travis S. Humble et al.
Many real-world problems are naturally formulated as higher-order optimization (HUBO) tasks involving dense, multi-variable interactions, which are challenging to solve with classical methods. Quantum optimization offers a promising route, but hardware constraints and limitations to quadratic formulations have hampered their practicality. Here, we develop a distributed quantum optimization framework (DQOF) for dense, large-scale HUBO problems. DQOF assigns quantum circuits a central role in directly capturing higher-order interactions, while high-performance computing orchestrates large-scale parallelism and coordination. A clustering strategy enables wide quantum circuits without increasing depth, allowing efficient execution on near-term quantum hardware. We demonstrate high-quality solutions for HUBOs up to 500 variables within 170 seconds, significantly outperforming conventional approaches in solution quality and scalability. Applied to optical metamaterial design, DQOF efficiently discovers high-performance structures and shows that higher-order interactions are important for practical optimization problems. These results establish DQOF as a practical and scalable computational paradigm for large-scale scientific optimization.