LGDLJul 6, 2022

The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage

UW
arXiv:2207.02960v17 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

It addresses the need for specific, actionable guidelines for practitioners in cultural heritage institutions to handle sensitive data responsibly, though it is incremental as it builds on existing organizational-level work.

The paper tackles the lack of practitioner guidelines for applying machine learning to cultural heritage data by developing the 'Collections as ML Data' checklist, which includes guiding questions and practices to support responsible project development.

Within the cultural heritage sector, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in libraries and other cultural heritage institutions at the organizational level, there remains a paucity of guidelines created specifically for practitioners embarking on machine learning projects. The manifold stakes and sensitivities involved in applying machine learning to cultural heritage underscore the importance of developing such guidelines. This paper contributes to this need by formulating a detailed checklist with guiding questions and practices that can be employed while developing a machine learning project that utilizes cultural heritage data. I call the resulting checklist the "Collections as ML Data" checklist, which, when completed, can be published with the deliverables of the project. By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes