The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny?
This addresses the threat of reduced academic contribution and scrutiny in machine learning due to unequal compute resources, which is an incremental analysis of an existing trend.
The paper investigates the compute divide between industrial and academic AI labs, showing it reduces academic involvement in compute-intensive topics like foundation models and shifts academic research towards using industry-developed open-source models. It recommends nationally-sponsored computing infrastructure and open science initiatives to boost academic compute access and suggests structured access programs and third-party auditing for better evaluation of industry systems.
There are pronounced differences in the extent to which industrial and academic AI labs use computing resources. We provide a data-driven survey of the role of the compute divide in shaping machine learning research. We show that a compute divide has coincided with a reduced representation of academic-only research teams in compute intensive research topics, especially foundation models. We argue that, academia will likely play a smaller role in advancing the associated techniques, providing critical evaluation and scrutiny, and in the diffusion of such models. Concurrent with this change in research focus, there is a noticeable shift in academic research towards embracing open source, pre-trained models developed within the industry. To address the challenges arising from this trend, especially reduced scrutiny of influential models, we recommend approaches aimed at thoughtfully expanding academic insights. Nationally-sponsored computing infrastructure coupled with open science initiatives could judiciously boost academic compute access, prioritizing research on interpretability, safety and security. Structured access programs and third-party auditing may also allow measured external evaluation of industry systems.