DLSep 13, 2021Code
Towards FAIR Principles for Open HardwareNadica Miljković, Ana Trisovic, Limor Peer
The lack of scientific openness is identified as one of the key challenges of computational reproducibility. In addition to Open Data, Free and Open-source Software (FOSS) and Open Hardware (OH) can address this challenge by introducing open policies, standards, and recommendations. However, while both FOSS and OH are free to use, study, modify, and redistribute, there are significant differences in sharing and reusing these artifacts. FOSS is increasingly supported with software repositories, but support for OH is lacking, potentially due to the complexity of its digital format and licensing. This paper proposes leveraging FAIR principles to make OH findable, accessible, interoperable, and reusable. We define what FAIR means for OH, how it differs from FOSS, and present examples of unique demands. Also, we evaluate dissemination platforms currently used for OH and provide recommendations.
LGNov 26, 2025
On the Origin of Algorithmic Progress in AIHans Gundlach, Alex Fogelson, Jayson Lynch et al.
Algorithms have been estimated to increase AI training FLOP efficiency by a factor of 22,000 between 2012 and 2023 [Ho et al., 2024]. Running small-scale ablation experiments on key innovations from this time period, we are able to account for less than 10x of these gains. Surveying the broader literature, we estimate that additional innovations not included in our ablations account for less than 10x, yielding a total under 100x. This leads us to conduct scaling experiments, which reveal that much of this efficiency gap can be explained by algorithms with scale-dependent efficiency improvements. In particular, we conduct scaling experiments between LSTMs and Transformers, finding exponent differences in their compute-optimal scaling law while finding little scaling difference for many other innovations. These experiments demonstrate that - contrary to standard assumptions - an algorithm's efficiency gains are tied to compute scale. Using experimental extrapolation and literature estimates, we account for 6,930x efficiency gains over the same time period, with the scale-dependent LSTM-to-Transformer transition accounting for the majority of gains. Our results indicate that algorithmic progress for small models has been far slower than previously assumed, and that measures of algorithmic efficiency are strongly reference-dependent.
SEMar 23, 2021
A large-scale study on research code quality and executionAna Trisovic, Matthew K. Lau, Thomas Pasquier et al.
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74\% of R files crashed in the initial execution, while 56\% crashed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals' collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
DLMay 6, 2020
Advancing computational reproducibility in the Dataverse data repository platformAna Trisovic, Philip Durbin, Tania Schlatter et al.
Recent reproducibility case studies have raised concerns showing that much of the deposited research has not been reproducible. One of their conclusions was that the way data repositories store research data and code cannot fully facilitate reproducibility due to the absence of a runtime environment needed for the code execution. New specialized reproducibility tools provide cloud-based computational environments for code encapsulation, thus enabling research portability and reproducibility. However, they do not often enable research discoverability, standardized data citation, or long-term archival like data repositories do. This paper addresses the shortcomings of data repositories and reproducibility tools and how they could be overcome to improve the current lack of computational reproducibility in published and archived research outputs.