LG AI CL CYMay 19, 2023

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

arXiv:2305.11663v121 citations

Originality Synthesis-oriented

AI Analysis

This work applies an existing method to a broader humanities domain, showing it can work with simpler tools, but it is incremental as it primarily replicates and extends prior research.

The paper tests a method using machine learning's failed predictions to identify ambiguous cases for qualitative analysis in humanities data, finding that unpredictable actions were more emotionally loaded and complex, supporting the method's utility with a simpler algorithm.

This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.

View on arXiv PDF

Similar