QUANT-PHAug 8, 2023
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKFlorian J. Kiwit, Marwa Marso, Philipp Ross et al.
Benchmarking of quantum machine learning (QML) algorithms is challenging due to the complexity and variability of QML systems, e.g., regarding model ansatzes, data sets, training techniques, and hyper-parameters selection. The QUantum computing Application benchmaRK (QUARK) framework simplifies and standardizes benchmarking studies for quantum computing applications. Here, we propose several extensions of QUARK to include the ability to evaluate the training and deployment of quantum generative models. We describe the updated software architecture and illustrate its flexibility through several example applications: (1) We trained different quantum generative models using several circuit ansatzes, data sets, and data transformations. (2) We evaluated our models on GPU and real quantum hardware. (3) We assessed the generalization capabilities of our generative models using a broad set of metrics that capture, e.g., the novelty and validity of the generated data.
48.7QUANT-PHApr 3
Hybrid Quantum-HPC Middleware Systems for Adaptive Resource, Workload and Task ManagementPradeep Mantha, Florian J. Kiwit, Nishant Saurabh et al.
Hybrid quantum-classical applications pose significant resource management challenges due to heterogeneity and dynamism in both infrastructure and workloads. Quantum-HPC environments integrate quantum processing units (QPUs) with diverse classical resources (CPUs, GPUs), while applications span coupling patterns from tightly coupled execution to loosely coupled task parallelism with varying resource requirements. Traditional HPC schedulers lack visibility into application semantics and cannot respond to fluctuating resource availability at runtime. This paper presents a middleware-based approach for adaptive resource, workload, and task management in hybrid quantum-HPC systems. We make four contributions: (i) a conceptual four-layer middleware architecture that decomposes management across workflow, workload, task, and resource levels, enabling application-aware scheduling over heterogeneous quantum-HPC resources; (ii) a set of execution motifs capturing interaction and coupling characteristics of hybrid applications, realized as quantum mini-apps for systematic workload characterization; (iii) Pilot-Quantum, a middleware framework built on the pilot abstraction that enables late binding and dynamic resource allocation, adapting to resource and workload dynamics at runtime; and (iv) Q-Dreamer, a performance modeling toolkit providing reusable components for informed workload partitioning, including a circuit-cutting optimizer that analytically derives optimal partitioning strategies. Evaluation on heterogeneous HPC platforms (Perlmutter, NVIDIA DGX with H100/B200 GPUs) demonstrates efficient multi-backend orchestration across CPUs, GPUs, and QPUs for diverse execution motifs. Q-Dreamer predicts optimal circuit cutting configurations with up to 82% accuracy.
56.5CLMay 10Code
Assessment of RAG and Fine-Tuning for Industrial Question-Answering-ApplicationsJakob Sturm, Josef Pichlmeier, Christian Bernhard et al.
Large Language Models (LLMs) are increasingly employed in enterprise question-answering (QA) systems, requiring adaptation to domain-specific knowledge. Among the most prevalent methods for incorporating such knowledge are Retrieval-Augmented Generation (RAG) and fine-tuning (FT). Yet, from a cost-accuracy trade-off perspective, it remains unclear which approach best suits industry scenarios. This study examines the impact of RAG and FT on two closed datasets specific to the automotive industry, assessing answer quality and operational costs. We extend the Cost-of-Pass framework proposed by Erol et al. (arXiv:2504.13359) to jointly assess output quality, generation cost, and user interaction cost. Our findings reveal that while premium models perform best out of the box, open-source models can achieve comparable quality when enhanced with RAG. Overall, RAG emerges as the most effective and cost-efficient adaptation method for both closed- and open-source models.
CLApr 22, 2024
Performance Characterization of Expert Router for Scalable LLM InferenceJosef Pichlmeier, Philipp Ross, Andre Luckow
Large Language Models (LLMs) have experienced widespread adoption across scientific and industrial domains due to their versatility and utility for diverse tasks. Nevertheless, deploying and serving these models at scale with optimal throughput and latency remains a significant challenge, primarily because of LLMs' high computational and memory demands. Specialized models optimized for specific tasks can be combined through a routing mechanism to address these challenges, creating a modular inference system. This paper introduces Expert Router, a scalable routing architecture that directs prompts to specialized expert models. We characterize multiple Expert Router configurations, including different LLama 3 models with quantized and non-quantized weights under up to 1,000 concurrent users. Our findings reveal that Expert Router introduces minimal latency overhead, with the configuration of expert models being a dominating factor in performance outcomes. High-parameter expert models deliver stable throughput and latency under moderate concurrency levels. In contrast, smaller expert models maintain competitive performance across a wider range of concurrent users compared to tensor-parallelized baseline models. This highlights the potential of Expert Router for efficient and scalable LLM deployment.
CRJul 25, 2021
Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature ReviewGonzalo Munilla Garrido, Johannes Sedlmeir, Ömer Uludağ et al.
IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that-while solutions have been suggested to some extent-are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected.
DCFeb 20, 2020
Methods and Experiences for Developing Abstractions for Data-intensive, Scientific ApplicationsAndre Luckow, Shantenu Jha
Developing software for scientific applications that require the integration of diverse types of computing, instruments, and data present challenges that are distinct from commercial software. These applications require scale, and the need to integrate various programming and computational models with evolving and heterogeneous infrastructure. Pervasive and effective abstractions for distributed infrastructures are thus critical; however, the process of developing abstractions for scientific applications and infrastructures is not well understood. While theory-based approaches for system development are suited for well-defined, closed environments, they have severe limitations for designing abstractions for scientific systems and applications. The design science research (DSR) method provides the basis for designing practical systems that can handle real-world complexities at all levels. In contrast to theory-centric approaches, DSR emphasizes both practical relevance and knowledge creation by building and rigorously evaluating all artifacts. We show how DSR provides a well-defined framework for developing abstractions and middleware systems for distributed systems. Specifically, we address the critical problem of distributed resource management on heterogeneous infrastructure over a dynamic range of scales, a challenge that currently limits many scientific applications. We use the pilot-abstraction, a widely used resource management abstraction for high-performance, high throughput, big data, and streaming applications, as a case study for evaluating the DSR activities. For this purpose, we analyze the research process and artifacts produced during the design and evaluation of the pilot-abstraction. We find DSR provides a concise framework for iteratively designing and evaluating systems. Finally, we capture our experiences and formulate different lessons learned.
LGApr 30, 2017
Deep Learning in the Automotive Industry: Applications and ToolsAndre Luckow, Matthew Cook, Nathan Ashcraft et al.
Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive uses cases for deep learning in particular in the domain of computer vision. We surveys the current state-of-the-art in libraries, tools and infrastructures (e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural networks. We particularly focus on convolutional neural networks and computer vision use cases, such as the visual inspection process in manufacturing plants and the analysis of social media data. To train neural networks, curated and labeled datasets are essential. In particular, both the availability and scope of such datasets is typically very limited. A main contribution of this paper is the creation of an automotive dataset, that allows us to learn and automatically recognize different vehicle properties. We describe an end-to-end deep learning application utilizing a mobile app for data collection and process support, and an Amazon-based cloud backend for storage and training. For training we evaluate the use of cloud and on-premises infrastructures (including multiple GPUs) in conjunction with different neural network architectures and frameworks. We assess both the training times as well as the accuracy of the classifier. Finally, we demonstrate the effectiveness of the trained classifier in a real world setting during manufacturing process.
MLNov 16, 2016
Algebraic multigrid support vector machinesEhsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy et al.
The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.