Daniel S. Katz

SE
h-index4
31papers
505citations
Novelty8%
AI Score32

31 Papers

CYSep 30, 2022
FAIR for AI: An interdisciplinary and international community building perspective

E. A. Huerta, Ben Blaiszik, L. Catherine Brinson et al.

A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022.

HEP-EXDec 9, 2022
FAIR AI Models in High Energy Physics

Javier Duarte, Haoyang Li, Avik Roy et al.

The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.

7.0SEApr 19
Technology Research Software: An Often Overlooked Category of Research Software

Wilhelm Hasselbring, Daniel S. Katz, Rob van Nieuwpoort

Research software has been categorized for various goals. One fundamental dimension of such categorizations is the role that the software plays in the research process. Recently, a new role category has emerged: technology research software, which covers research software developed in technology research. Until now, this category of technology research software has often been overlooked and neglected within the research software engineering community. In this article, we explain technology research software and its primary subroles. Technology readiness levels are an established method of estimating the maturity of technologies, including software systems. For technology research software, these readiness levels define secondary subroles. To illustrate the concept of technology research software and to make it more tangible, we present examples of research software that, depending on its specific use within or outside of research, take on the role of technology research software as well as that of another research software category.

SEFeb 24, 2019Code
Sustaining Research Software: an SC18 Panel

Daniel S. Katz, Patrick Aerts, Neil P. Chue Hong et al.

Many science advances have been possible thanks to the use of research software, which has become essential to advancing virtually every Science, Technology, Engineering and Mathematics (STEM) discipline and many non-STEM disciplines including social sciences and humanities. And while much of it is made available under open source licenses, work is needed to develop, support, and sustain it, as underlying systems and software as well as user needs evolve. In addition, the changing landscape of high-performance computing (HPC) platforms, where performance and scaling advances are ever more reliant on software and algorithm improvements as we hit hardware scaling barriers, is causing renewed tension between sustainability of software and its performance. We must do more to highlight the trade-off between performance and sustainability, and to emphasize the need for sustainability given the fact that complex software stacks don't survive without frequent maintenance; made more difficult as a generation of developers of established and heavily-used research software retire. Several HPC forums are doing this, and it has become an active area of funding as well. In response, the authors organized and ran a panel at the SC18 conference. The objectives of the panel were to highlight the importance of sustainability, to illuminate the tension between pure performance and sustainability, and to steer SC community discussion toward understanding and addressing this issue and this tension. The outcome of the discussions, as presented in this paper, can inform choices of advance compute and data infrastructures to positively impact future research software and future research.

AIDec 12, 2023
Leveraging Large Language Models to Build and Execute Computational Workflows

Alejandro Duque, Abdullah Syed, Kastan V. Day et al.

The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eliminating the need for traditional coding methods. We present initial findings from our attempt to integrate Phyloflow with OpenAI's function-calling API, and outline a strategy for developing a comprehensive workflow management system based on these concepts.

AIJun 20, 2024
Training Next Generation AI Users and Developers at NCSA

Daniel S. Katz, Volodymyr Kindratenko, Olena Kindratenko et al.

This article focuses on training work carried out in artificial intelligence (AI) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign via a research experience for undergraduates (REU) program named FoDOMMaT. It also describes why we are interested in AI, and concludes by discussing what we've learned from running this program and its predecessor over six years.

HEP-EXAug 4, 2021
A FAIR and AI-ready Higgs boson decay dataset

Yifan Chen, E. A. Huerta, Javier Duarte et al.

To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.

SEMar 11, 2021
Research Software Sustainability and Citation

Stephan Druskat, Daniel S. Katz, Ilian T. Todorov

Software citation contributes to achieving software sustainability in two ways: It provides an impact metric to incentivize stakeholders to make software sustainable. It also provides references to software used in research, which can be reused and adapted to become sustainable. While software citation faces a host of technical and social challenges, community initiatives have defined the principles of software citation and are working on implementing solutions.

SEMar 5, 2021
Addressing Research Software Sustainability via Institutes

Daniel S. Katz, Jeffrey C. Carver, Neil P. Chue Hong et al.

Research software is essential to modern research, but it requires ongoing human effort to sustain: to continually adapt to changes in dependencies, to fix bugs, and to add new features. Software sustainability institutes, amongst others, develop, maintain, and disseminate best practices for research software sustainability, and build community around them. These practices can both reduce the amount of effort that is needed and create an environment where the effort is appreciated and rewarded. The UK SSI is such an institute, and the US URSSI and the Australian AuSSI are planning to become institutes, and this extended abstract discusses them and the strengths and weaknesses of this approach.

SEMar 2, 2021
Sustaining Research Software via Research Software Engineers and Professional Associations

Jeffrey C. Carver, Ian A. Cosden, Chris Hill et al.

Research software is a class of software developed to support research. Today a wealth of such software is created daily in universities, government, and commercial research enterprises worldwide. The sustainability of this software faces particular challenges due, at least in part, to the type of people who develop it. These Research Software Engineers (RSEs) face challenges in developing and sustaining software that differ from those faced by the developers of traditional software. As a result, professional associations have begun to provide support, advocacy, and resources for RSEs. These benefits are critical to sustaining RSEs, especially in environments where their contributions are often undervalued and not rewarded. This paper focuses on how professional associations, such as the United States Research Software Engineer Association (US-RSE), can provide this.

SEJan 26, 2021
A Fresh Look at FAIR for Research Software

Daniel S. Katz, Morane Gruenpeter, Tom Honeyman et al.

This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the characteristic and practiced differences of research software and data. This vision and understanding of initial conditions serves as a backdrop for an attempt at translating and interpreting the guiding principles to more fully align with research software. We have found that many of the principles remained relatively intact as written, as long as considerable interpretation was provided. This was particularly the case for the "Findable" and "Accessible" foundational principles. We found that "Interoperability" and "Reusability" are particularly prone to a broad and sometimes opposing set of interpretations as written. We propose two new principles modeled on existing ones, and provide modified guiding text for these principles to help clarify our final interpretation. A series of gaps in translation were captured during this process, and these remain to be addressed. We finish with a consideration of where these translated principles fall short of the vision laid out in the opening.

GR-QCDec 15, 2020
Accelerated, Scalable and Reproducible AI-driven Gravitational Wave Detection

E. A. Huerta, Asad Khan, Xiaobo Huang et al.

The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed computing service. Using this workflow, an ensemble of four openly available AI models can be run on HAL to process an entire month's worth (August 2017) of advanced Laser Interferometer Gravitational-Wave Observatory data in just seven minutes, identifying all four all four binary black hole mergers previously identified in this dataset and reporting no misclassifications. This approach combines advances in AI, distributed computing, and scientific data infrastructure to open new pathways to conduct reproducible, accelerated, data-driven discovery.

HEP-EXOct 10, 2020
Software Sustainability & High Energy Physics

Daniel S. Katz, Sudhir Malik, Mark S. Neubauer et al.

New facilities of the 2020s, such as the High Luminosity Large Hadron Collider (HL-LHC), will be relevant through at least the 2030s. This means that their software efforts and those that are used to analyze their data need to consider sustainability to enable their adaptability to new challenges, longevity, and efficiency, over at least this period. This will help ensure that this software will be easier to develop and maintain, that it remains available in the future on new platforms, that it meets new needs, and that it is as reusable as possible. This report discusses a virtual half-day workshop on "Software Sustainability and High Energy Physics" that aimed 1) to bring together experts from HEP as well as those from outside to share their experiences and practices, and 2) to articulate a vision that helps the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) to create a work plan to implement elements of software sustainability. Software sustainability practices could lead to new collaborations, including elements of HEP software being directly used outside the field, and, as has happened more frequently in recent years, to HEP developers contributing to software developed outside the field rather than reinventing it. A focus on and skills related to sustainable software will give HEP software developers an important skill that is essential to careers in the realm of software, inside or outside HEP. The report closes with recommendations to improve software sustainability in HEP, aimed at the HEP community via IRIS-HEP and the HEP Software Foundation (HSF).

COMP-PHMar 18, 2020
Convergence of Artificial Intelligence and High Performance Computing on NSF-supported Cyberinfrastructure

E. A. Huerta, Asad Khan, Edward Davis et al.

Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.

GR-QCNov 26, 2019
Enabling real-time multi-messenger astrophysics discoveries with deep learning

E. A. Huerta, Gabrielle Allen, Igor Andreoni et al.

Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravitational wave sources and their electromagnetic and astroparticle counterparts, and make a number of recommendations to maximize their potential for scientific discovery. These recommendations refer to the design of scalable and computationally efficient machine learning algorithms; the cyber-infrastructure to numerically simulate astrophysical sources, and to process and interpret multi-messenger astrophysics data; the management of gravitational wave detections to trigger real-time alerts for electromagnetic and astroparticle follow-ups; a vision to harness future developments of machine learning and cyber-infrastructure resources to cope with the big-data requirements; and the need to build a community of experts to realize the goals of multi-messenger astrophysics.

SEOct 22, 2019
Theory-Software Translation: Research Challenges and Future Directions

Caroline Jay, Robert Haines, Daniel S. Katz et al.

The Theory-Software Translation Workshop, held in New Orleans in February 2019, explored in depth the process of both instantiating theory in software - for example, implementing a mathematical model in code as part of a simulation - and using the outputs of software - such as the behavior of a simulation - to advance knowledge. As computation within research is now ubiquitous, the workshop provided a timely opportunity to reflect on the particular challenges of research software engineering - the process of developing and maintaining software for scientific discovery. In addition to the general challenges common to all software development projects, research software additionally must represent, manipulate, and provide data for complex theoretical constructs. Ensuring this process is robust is essential to maintaining the integrity of the science resulting from it, and the workshop highlighted a number of areas where the current approach to research software engineering would benefit from an evidence base that could be used to inform best practice. The workshop brought together expert research software engineers and academics to discuss the challenges of Theory-Software Translation over a two-day period. This report provides an overview of the workshop activities, and a synthesises of the discussion that was recorded. The body of the report presents a thematic analysis of the challenges of Theory-Software Translation as identified by workshop participants, summarises these into a set of research areas, and provides recommendations for the future direction of this work.

SEMar 2, 2019
Research Software Development & Management in Universities: Case Studies from Manchester's RSDS Group, Illinois' NCSA, and Notre Dame's CRC

Daniel S. Katz, Kenton McHenry, Caleb Reinking et al.

Modern research in the sciences, engineering, humanities, and other fields depends on software, and specifically, research software. Much of this research software is developed in universities, by faculty, postdocs, students, and staff. In this paper, we focus on the role of university staff. We examine three different, independently-developed models under which these staff are organized and perform their work, and comparatively analyze these models and their consequences on the staff and on the software, considering how the different models support software engineering practices and processes. This information can be used by software engineering researchers to understand the practices of such organizations and by universities who want to set up similar organizations and to better produce and maintain research software.

IMFeb 1, 2019
Deep Learning for Multi-Messenger Astrophysics: A Gateway for Discovery in the Big Data Era

Gabrielle Allen, Igor Andreoni, Etienne Bachelet et al.

This report provides an overview of recent work that harnesses the Big Data Revolution and Large Scale Computing to address grand computational challenges in Multi-Messenger Astrophysics, with a particular emphasis on real-time discovery campaigns. Acknowledging the transdisciplinary nature of Multi-Messenger Astrophysics, this document has been prepared by members of the physics, astronomy, computer science, data science, software and cyberinfrastructure communities who attended the NSF-, DOE- and NVIDIA-funded "Deep Learning for Multi-Messenger Astrophysics: Real-time Discovery at Scale" workshop, hosted at the National Center for Supercomputing Applications, October 17-19, 2018. Highlights of this report include unanimous agreement that it is critical to accelerate the development and deployment of novel, signal-processing algorithms that use the synergy between artificial intelligence (AI) and high performance computing to maximize the potential for scientific discovery with Multi-Messenger Astrophysics. We discuss key aspects to realize this endeavor, namely (i) the design and exploitation of scalable and computationally efficient AI algorithms for Multi-Messenger Astrophysics; (ii) cyberinfrastructure requirements to numerically simulate astrophysical sources, and to process and interpret Multi-Messenger Astrophysics data; (iii) management of gravitational wave detections and triggers to enable electromagnetic and astro-particle follow-ups; (iv) a vision to harness future developments of machine and deep learning and cyberinfrastructure resources to cope with the scale of discovery in the Big Data Era; (v) and the need to build a community that brings domain experts together with data scientists on equal footing to maximize and accelerate discovery in the nascent field of Multi-Messenger Astrophysics.

SEJul 19, 2018
The State of Sustainable Research Software: Results from the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1)

Daniel S. Katz, Stephan Druskat, Robert Haines et al.

This article summarizes motivations, organization, and activities of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) held in Manchester, UK in September 2017. The WSSSPE series promotes sustainable research software by positively impacting principles and best practices, careers, learning, and credit. This article discusses the Code of Conduct, idea papers, position papers, experience papers, demos, and lightning talks presented during the workshop. The main part of the article discusses the speed-blogging groups that formed during the meeting, along with the outputs of those sessions.

SEJul 11, 2018
Building a Sustainable Structure for Research Software Engineering Activities

Jeremy Cohen, Daniel S. Katz, Michelle Barker et al.

The profile of research software engineering has been greatly enhanced by developments at institutions around the world to form groups and communities that can support effective, sustainable development of research software. We observe, however, that there is still a long way to go to build a clear understanding about what approaches provide the best support for research software developers in different contexts, and how such understanding can be used to suggest more formal structures, models or frameworks that can help to further support the growth of research software engineering. This paper sets out some preliminary thoughts and proposes an initial high-level model based on discussions between the authors around the concept of a set of pillars representing key activities and processes that form the core structure of a successful research software engineering offering.

SEJul 4, 2018
Mapping the research software sustainability space

Stephan Druskat, Daniel S. Katz

A growing number of largely uncoordinated initiatives focus on research software sustainability. A comprehensive mapping of the research software sustainability space can help identify gaps in their efforts, track results, and avoid duplication of work. To this end, this paper suggests enhancing an existing schematic of activities in research software sustainability, and formalizing it in a directed graph model. Such a model can be further used to define a classification schema which, applied to research results in the field, can drive the identification of past activities and the planning of future efforts.

SEJun 20, 2017
Understanding Software in Research: Initial Results from Examining Nature and a Call for Collaboration

Udit Nangia, Daniel S. Katz

This lightning talk paper discusses an initial data set that has been gathered to understand the use of software in research, and is intended to spark wider interest in gathering more data. The initial data analyzes three months of articles in the journal Nature for software mentions. The wider activity that we seek is a community effort to analyze a wider set of articles, including both a longer timespan of Nature articles as well as articles in other journals. Such a collection of data could be used to understand how the role of software has changed over time and how it varies across fields.

SEMay 7, 2017
Report on the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

Daniel S. Katz, Kyle E. Niemeyer, Sandra Gesing et al.

This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees.

SEFeb 6, 2016
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)

Daniel S. Katz, Sou-Cheng T. Choi, Kyle E. Niemeyer et al.

This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including developing pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group's future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happen.

CYAug 13, 2015
Looking at Software Sustainability and Productivity Challenges from NSF

Daniel S. Katz, Rajiv Ramnath

This paper is a contribution to the Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges) Workshop (https://www.nitrd.gov/csessp/), sponsored by the Networking and Information Technology Research and Development (NITRD) Software Design and Productivity (SDP) Coordinating Group, held October 15th-16th 2015 in Washington DC, USA. It introduces the role of software at the National Science Foundation (NSF) and the NSF Software Infrastructure for Sustained Innovation (SI2) program, then describes challenges that the SI2 program has identified, including funding models, career paths, incentives, training, interdisciplinary work, portability, and dissemination, as well as lesson that have been learned.

SEJul 7, 2015
Report on the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2)

Daniel S. Katz, Sou-Cheng T. Choi, Nancy Wilkins-Diehr et al.

This technical report records and discusses the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). The report includes a description of the alternative, experimental submission and review process, two workshop keynote presentations, a series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of exploring sustainability; software development experiences; credit & incentives; reproducibility & reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible actions that were proposed and that would lead to potential change. The workshop recognized that reliance on scientific software is pervasive in all areas of world-leading research today. The workshop participants then proceeded to explore different perspectives on the concept of sustainability. Key enablers and barriers of sustainable scientific software were identified from their experiences. In addition, recommendations with new requirements such as software credit files and software prize frameworks were outlined for improving practices in sustainable software engineering. There was also broad consensus that formal training in software development or engineering was rare among the practitioners. Significant strides need to be made in building a sense of community via training in software and technical practices, on increasing their size and scope, and on better integrating them directly into graduate education programs. Finally, journals can define and publish policies to improve reproducibility, whereas reviewers can insist that authors provide sufficient information and access to data and software to allow them reproduce the results in the paper. Hence a list of criteria is compiled for journals to provide to reviewers so as to make it easier to review software submitted for publication as a "Software Paper."

SENov 13, 2014
Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2): Submission, Peer-Review and Sorting Process, and Results

Daniel S. Katz, Gabrielle Allen, Neil Chue Hong et al.

This technical report discusses the submission and peer-review process used by the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2) and the results of that process. It is intended to record both the alternative submission and program organization model used by WSSSPE2 as well as the papers associated with the workshop that resulted from that process.

SEApr 29, 2014
Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

Daniel S. Katz, Sou-Cheng T. Choi, Hilmar Lapp et al.

Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop. Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of "software papers", and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software.

SEFeb 18, 2014
Challenges in Selecting Software to be Reused

Daniel S. Katz

This is a position paper for Sharing, Re-Use and Circulation of Resources in Cooperative Scientific Work, a CSCW'14 workshop. It discusses the role of software in NSF's CIF21 vision and the SI2 program, which is intended to support that goal. SI2 primarily supports software projects that are proposed in response to solicitations, and some of the criteria used by the peer-reviewers and by NSF in evaluating these projects depend on predicting scientific impact. This paper discusses some ideas on how the prediction of scientific impact can be improved.

SENov 14, 2013
First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE): Submission and Peer-Review Process, and Results

Daniel S. Katz, Gabrielle Allen, Neil Chue Hong et al.

This technical report discusses the submission and peer-review process used by the First Workshop on on Sustainable Software for Science: Practice and Experiences (WSSSPE) and the results of that process. It is intended to record both this alternative model as well as the papers associated with the workshop that resulted from that process.

SESep 7, 2013
Reusability in Science: From Initial User Engagement to Dissemination of Results

Ketan Maheshwari, David Kelly, Scott J. Krieder et al.

Effective use of parallel and distributed computing in science depends upon multiple interdependent entities and activities that form an ecosystem. Active engagement between application users and technology catalysts is a crucial activity that forms an integral part of this ecosystem. Technology catalysts play a crucial role benefiting communities beyond a single user group. An effective user-engagement, use and reuse of tools and techniques has a broad impact on software sustainability. From our experience, we sketch a life-cycle for user-engagement activity in scientific computational environment and posit that application level reusability promotes software sustainability. We describe our experience in engaging two user groups from different scientific domains reusing a common software and configuration on different computational infrastructures.