Dan Gunter

SE
4papers
48citations
Novelty31%
AI Score34

4 Papers

3.6LGApr 1
An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis

Oluwamayowa O. Amusat, Luka Grbcic, Remi Patureau et al.

Designing reliable integrated energy systems for industrial processes requires optimization and verification models across multiple fidelities, from architecture-level sizing to high-fidelity dynamic operation. However, model mismatch across fidelities obscures the sources of performance loss and complicates the quantification of architecture-to-operation performance gaps. We propose an online, machine-learning-accelerated multi-resolution optimization framework that estimates an architecture-specific upper bound on achievable performance while minimizing expensive high-fidelity model evaluations. We demonstrate the approach on a pilot energy system supplying a 1 MW industrial heat load. First, we solve a multi-objective architecture optimization to select the system configuration and component capacities. We then develop an machine learning (ML)-accelerated multi-resolution, receding-horizon optimal control strategy that approaches the achievable-performance bound for the specified architecture, given the additional controls and dynamics not captured by the architectural optimization model. The ML-guided controller adaptively schedules the optimization resolution based on predictive uncertainty and warm-starts high-fidelity solves using elite low-fidelity solutions. Our results on the pilot case study show that the proposed multi-resolution strategy reduces the architecture-to-operation performance gap by up to 42% relative to a rule-based controller, while reducing required high-fidelity model evaluations by 34% relative to the same multi-fidelity approach without ML guidance, enabling faster and more reliable design verification. Together, these gains make high-fidelity verification tractable, providing a practical upper bound on achievable operational performance.

IRNov 8, 2023
Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

Oluwamayowa O. Amusat, Harshad Hegde, Christopher J. Mungall et al.

Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lacks the essential metadata required for researchers to find and search them effectively. The lack of metadata poses a significant challenge in the utilization of these datasets. Machine learning-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific datasets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming; thus, there is an need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining datasets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information about the unlabeled texts and the scientific domain. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly-specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.

SEFeb 6, 2016
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)

Daniel S. Katz, Sou-Cheng T. Choi, Kyle E. Niemeyer et al.

This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including developing pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group's future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happen.

SEOct 16, 2015
A Community Contribution Framework for Sharing Materials Data with Materials Project

Patrick Huck, Anubhav Jain, Dan Gunter et al.

As scientific discovery becomes increasingly data-driven, software platforms are needed to efficiently organize and disseminate data from disparate sources. This is certainly the case in the field of materials science. For example, Materials Project has generated computational data on over 60,000 chemical compounds and has made that data available through a web portal and REST interface. However, such portals must seek to incorporate community submissions to expand the scope of scientific data sharing. In this paper, we describe MPContribs, a computing/software infrastructure to integrate and organize contributions of simulated or measured materials data from users. Our solution supports complex submissions and provides interfaces that allow contributors to share analyses and graphs. A RESTful API exposes mechanisms for book-keeping, retrieval and aggregation of submitted entries, as well as persistent URIs or DOIs that can be used to reference the data in publications. Our approach isolates contributed data from a host project's quality-controlled core data and yet enables analyses across the entire dataset, programmatically or through customized web apps. We expect the developed framework to enhance collaborative determination of material properties and to maximize the impact of each contributor's dataset. In the long-term, MPContribs seeks to make Materials Project an institutional, and thus community-wide, memory for computational and experimental materials science.