Jack Bandy

h-index10

4papers

324citations

Novelty26%

AI Score27

Ranked #154,776 of 194,257 authors (top 80%)#502 in CY (top 52%)

4 Papers

7.9CLMay 11, 2021Code

Addressing "Documentation Debt" in Machine Learning Research: A Retrospective Datasheet for BookCorpus

Jack Bandy, Nicholas Vincent

Recent literature has underscored the importance of dataset documentation work for machine learning, and part of this work involves addressing "documentation debt" for datasets that have been used widely but documented sparsely. This paper aims to help address documentation debt for BookCorpus, a popular text dataset for training large language models. Notably, researchers have used BookCorpus to train OpenAI's GPT-N models and Google's BERT models, even though little to no documentation exists about the dataset's motivation, composition, collection process, etc. We offer a preliminary datasheet that provides key context and information about BookCorpus, highlighting several notable deficiencies. In particular, we find evidence that (1) BookCorpus likely violates copyright restrictions for many books, (2) BookCorpus contains thousands of duplicated books, and (3) BookCorpus exhibits significant skews in genre representation. We also find hints of other potential deficiencies that call for future research, including problematic content, potential skews in religious representation, and lopsided author contributions. While more work remains, this initial effort to provide a datasheet for BookCorpus adds to growing literature that urges more careful and systematic documentation for machine learning datasets.

18.8CYFeb 3, 2021Code

Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits

Jack Bandy

While algorithm audits are growing rapidly in commonality and public importance, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search, vision, advertising, etc.), organization (e.g. Google, Facebook, Amazon, etc.), and audit method (e.g. sock puppet, direct scrape, crowdsourcing, etc.). The review shows how previous audit studies have exposed public-facing algorithms exhibiting problematic behavior, such as search algorithms culpable of distortion and advertising algorithms culpable of discrimination. Based on the studies reviewed, it also suggests some behaviors (e.g. discrimination on the basis of intersectional identities), domains (e.g. advertising algorithms), methods (e.g. code auditing), and organizations (e.g. Twitter, TikTok, LinkedIn) that call for future audit attention. The paper concludes by offering the common ingredients of successful audits, and discussing algorithm auditing in the context of broader research working toward algorithmic justice.

11.1HCDec 14, 2020

#TulsaFlop: A Case Study of Algorithmically-Influenced Collective Action on TikTok

Jack Bandy, Nicholas Diakopoulos

When a re-election rally for the U.S. president drew smaller crowds than expected in Tulsa, Oklahoma, many people attributed the low turnout to collective action organized by TikTok users. Motivated by TikTok's surge in popularity and its growing sociopolitical implications, this work explores the role of TikTok's recommender algorithm in amplifying call-to-action videos that promoted collective action against the Tulsa rally. We analyze call-to-action videos from more than 600 TikTok users and compare the visibility (i.e. play count) of these videos with other videos published by the same users. Evidence suggests that Tulsa-related videos generally received more plays, and in some cases the amplification was dramatic. For example, one user's call-to-action video was played over 2 million times, but no other video by the user exceeded 100,000 plays, and the user had fewer than 20,000 followers. Statistical modeling suggests that the increased play count is explained by increased engagement rather than any systematic amplification of call-to-action videos. We conclude by discussing the implications of recommender algorithms amplifying sociopolitical messages, and motivate several promising areas for future work.

7.3CYAug 1, 2019Code

Auditing News Curation Systems: A Case Study Examining Algorithmic and Editorial Logic in Apple News

Jack Bandy, Nicholas Diakopoulos

This work presents an audit study of Apple News as a sociotechnical news curation system that exercises gatekeeping power in the media. We examine the mechanisms behind Apple News as well as the content presented in the app, outlining the social, political, and economic implications of both aspects. We focus on the Trending Stories section, which is algorithmically curated, and the Top Stories section, which is human-curated. Results from a crowdsourced audit showed minimal content personalization in the Trending Stories section, and a sock-puppet audit showed no location-based content adaptation. Finally, we perform an extended two-month data collection to compare the human-curated Top Stories section with the algorithmically curated Trending Stories section. Within these two sections, human curation outperformed algorithmic curation in several measures of source diversity, concentration, and evenness. Furthermore, algorithmic curation featured more "soft news" about celebrities and entertainment, while editorial curation featured more news about policy and international events. To our knowledge, this study provides the first data-backed characterization of Apple News in the United States.