Nick Merrill

HC
6papers
269citations
Novelty39%
AI Score43

6 Papers

62.3CRMay 27
Symmetry Defeats Auditing

Nick Merrill, Zeke Medley

We demonstrate an attack on Introspection Adapters (Shenoy et al., 2026).

HCJan 23, 2023
The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Jesse Josua Benjamin, Heidi Biggs, Arne Berger et al.

Artificial intelligence (AI) technologies are widely deployed in smartphone photography; and prompt-based image synthesis models have rapidly become commonplace. In this paper, we describe a Research-through-Design (RtD) project which explores this shift in the means and modes of image production via the creation and use of the Entoptic Field Camera. Entoptic phenomena usually refer to perceptions of floaters or bright blue dots stemming from the physiological interplay of the eye and brain. We use the term entoptic as a metaphor to investigate how the material interplay of data and models in AI technologies shapes human experiences of reality. Through our case study using first-person design and a field study, we offer implications for critical, reflective, more-than-human and ludic design to engage AI technologies; the conceptualisation of an RtD research space which contributes to AI literacy discourses; and outline a research trajectory concerning materiality and design affordances of AI technologies.

72.0AIMay 21
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

Nick Merrill, Jaeho Lee, Ezra Karger

We document inverse scaling in LLMs on forecasting problems whose underlying time series exhibit superlinear growth and tail risk of regime change, a structure common in finance and epidemiology. On these tasks, more capable models produce worse distributional forecasts. The pattern appears on ForecastBench-Sim (FBSim), a contamination-free, simulated-world benchmark we release, in forecasting synthetic SIR epidemics with a matched linear control, and replicates in real-world datasets on COVID-19, measles, housing markets, and hyperinflation. A per-quantile decomposition shows the failure concentrates at the upper tail, which more capable models shift upward to track aggressive extrapolations of growth, while the lower tail stays put. A within-family study of Llama-3.1 shows that both model scale and post-training independently contribute to this effect. Domain knowledge does not reliably rescue calibration. This inverse scaling does not appear on single-threshold metrics common in LLM forecasting benchmarks, reversing the sign of the capability--accuracy relationship on identical outputs. Single-threshold scoring at conventional cutoffs misses the upper-tail cost; tail-inclusive scoring reverses the sign of the capability--accuracy relationship on the same outputs. We recommend that LLM forecasting evaluations use continuous (and unbounded) measures of accuracy alongside bounded binary threshold metrics.

CYFeb 17, 2022
Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics

Richmond Y. Wong, Michael A. Madaio, Nick Merrill

Numerous toolkits have been developed to support ethical AI development. However, toolkits, like all tools, encode assumptions in their design about what work should be done and how. In this paper, we conduct a qualitative analysis of 27 AI ethics toolkits to critically examine how the work of ethics is imagined and how it is supported by these toolkits. Specifically, we examine the discourses toolkits rely on when talking about ethical issues, who they imagine should do the work of ethics, and how they envision the work practices involved in addressing ethics. Among the toolkits, we identify a mismatch between the imagined work of ethics and the support the toolkits provide for doing that work. In particular, we identify a lack of guidance around how to navigate labor, organizational, and institutional power dynamics as they relate to performing ethical work. We use these omissions to chart future work for researchers and designers of AI ethics toolkits.

NIOct 9, 2021
From Fragmentation to Liberation

Nick Merrill

In this paper, I argue that "Internet fragmentation" as a phenomenon is only meaningful in the context of the US's hegemonic control over the Internet. I propose a broader and, I argue, more richly predictive frame: Internet conflict. I show how this frame provides fresh analytical purchase to some of the questions I list above, using it to contextualize several apparently distinct phenomena. I conclude by arguing that only one question gives this analytical frame, or any other, a higher purpose: what particular interventions to Internet governance can produce meaningfully liberatory outcomes? Any descriptive framework is only useful insofar as it can be mobilized to answer this normative question.

HCJan 11, 2021
Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Jesse Josua Benjamin, Arne Berger, Nick Merrill et al.

Design research is important for understanding and interrogating how emerging technologies shape human experience. However, design research with Machine Learning (ML) is relatively underdeveloped. Crucially, designers have not found a grasp on ML uncertainty as a design opportunity rather than an obstacle. The technical literature points to data and model uncertainties as two main properties of ML. Through post-phenomenology, we position uncertainty as one defining material attribute of ML processes which mediate human experience. To understand ML uncertainty as a design material, we investigate four design research case studies involving ML. We derive three provocative concepts: thingly uncertainty: ML-driven artefacts have uncertain, variable relations to their environments; pattern leakage: ML uncertainty can lead to patterns shaping the world they are meant to represent; and futures creep: ML technologies texture human relations to time with uncertainty. Finally, we outline design research trajectories and sketch a post-phenomenological approach to human-ML relations.