Practices for Engineering Trustworthy Machine Learning Applications
This work provides a practical framework for ML development teams to implement trustworthy ML, though it is incremental as it builds on existing guidelines and catalogues.
The paper addresses the gap between high-level guidelines for trustworthy machine learning and actionable development practices by identifying 14 new operational practices through a literature review and survey, revealing low adoption rates, especially for security-related practices.
Following the recent surge in adoption of machine learning (ML), the negative impact that improper use of ML can have on users and society is now also widely recognised. To address this issue, policy makers and other stakeholders, such as the European Commission or NIST, have proposed high-level guidelines aiming to promote trustworthy ML (i.e., lawful, ethical and robust). However, these guidelines do not specify actions to be taken by those involved in building ML systems. In this paper, we argue that guidelines related to the development of trustworthy ML can be translated to operational practices, and should become part of the ML development life cycle. Towards this goal, we ran a multi-vocal literature review, and mined operational practices from white and grey literature. Moreover, we launched a global survey to measure practice adoption and the effects of these practices. In total, we identified 14 new practices, and used them to complement an existing catalogue of ML engineering practices. Initial analysis of the survey results reveals that so far, practice adoption for trustworthy ML is relatively low. In particular, practices related to assuring security of ML components have very low adoption. Other practices enjoy slightly larger adoption, such as providing explanations to users. Our extended practice catalogue can be used by ML development teams to bridge the gap between high-level guidelines and actual development of trustworthy ML systems; it is open for review and contribution