HCMay 29
AI Behavioral ScienceMatthew O. Jackson, Qiaozhu Me, Stephanie W. Wang et al.
We outline a foundation for a new field of ``AI Behavioral Science,'' covering three perspectives. First, as AI becomes ubiquitous and is increasingly proprietary and opaque, it becomes vital to develop techniques for assessing AI behavior. We outline how tools developed to assess people's behaviors by social scientists can be used to assess and infer AI's behaviors biases, tendencies, and heuristics. Second, we also discuss how AI can change the ways in which we learn about human behavior. Beyond its computational power, AI offers new techniques for simulating, inferring, and predicting human behaviors that we outline and discuss. Third, as humans and AI are interacting in increasingly complex and intertwined systems, we need to understand the implications for the resulting economic and political outcomes. We outline issues that are increasingly pressing concerning the future of human-AI interactions and potential changes and disruptions that can ensue.
AIOct 5, 2023
Artificial Intelligence Index Report 2023Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson et al. · salesforce, stanford
Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.
AIMay 2, 2022
The AI Index 2022 Annual ReportDaniel Zhang, Nestor Maslej, Erik Brynjolfsson et al. · salesforce, stanford
Welcome to the fifth edition of the AI Index Report! The latest edition includes data from a broad set of academic, private, and nonprofit organizations as well as more self-collected data and original analysis than any previous editions, including an expanded technical performance chapter, a new survey of robotics researchers around the world, data on global AI legislation records in 25 countries, and a new chapter with an in-depth analysis of technical AI ethics metrics. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.
CYOct 31, 2022
Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial IntelligencePeter Stone, Rodney Brooks, Erik Brynjolfsson et al.
In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Peter Stone of the University of Texas at Austin. The report, entitled "Artificial Intelligence and Life in 2030," examines eight domains of typical urban settings on which AI is likely to have impact over the coming years: transportation, home and service robots, healthcare, education, public safety and security, low-resource communities, employment and workplace, and entertainment. It aims to provide the general public with a scientifically and technologically accurate portrayal of the current state of AI and its potential and to help guide decisions in industry and governments, as well as to inform research and development in the field. The charge for this report was given to the panel by the AI100 Standing Committee, chaired by Barbara Grosz of Harvard University.
HCApr 3, 2025
LLM Social Simulations Are a Promising Research MethodJacy Reese Anthis, Ryan Liu, Sean M. Richardson et al.
Accurate and verifiable large language model (LLM) simulations of human research subjects promise an accessible data source for understanding human behavior and training new AI systems. However, results to date have been limited, and few social scientists have adopted this method. In this position paper, we argue that the promise of LLM social simulations can be achieved by addressing five tractable challenges. We ground our argument in a review of empirical comparisons between LLMs and human research subjects, commentaries on the topic, and related work. We identify promising directions, including context-rich prompting and fine-tuning with social science datasets. We believe that LLM social simulations can already be used for pilot and exploratory studies, and more widespread use may soon be possible with rapidly advancing LLM capabilities. Researchers should prioritize developing conceptual models and iterative evaluations to make the best use of new AI systems.
AIApr 8, 2025
Artificial Intelligence Index Report 2025Nestor Maslej, Loredana Fattorini, Raymond Perrault et al. · salesforce, stanford
Welcome to the eighth edition of the AI Index report. The 2025 Index is our most comprehensive to date and arrives at an important moment, as AI's influence across society, the economy, and global governance continues to intensify. New in this year's report are in-depth analyses of the evolving landscape of AI hardware, novel estimates of inference costs, and new analyses of AI publication and patenting trends. We also introduce fresh data on corporate adoption of responsible AI practices, along with expanded coverage of AI's growing role in science and medicine. Since its founding in 2017 as an offshoot of the One Hundred Year Study of Artificial Intelligence, the AI Index has been committed to equipping policymakers, journalists, executives, researchers, and the public with accurate, rigorously validated, and globally sourced data. Our mission has always been to help these stakeholders make better-informed decisions about the development and deployment of AI. In a world where AI is discussed everywhere - from boardrooms to kitchen tables - this mission has never been more essential. The AI Index continues to lead in tracking and interpreting the most critical trends shaping the field - from the shifting geopolitical landscape and the rapid evolution of underlying technologies, to AI's expanding role in business, policymaking, and public life. Longitudinal tracking remains at the heart of our mission. In a domain advancing at breakneck speed, the Index provides essential context - helping us understand where AI stands today, how it got here, and where it may be headed next. Recognized globally as one of the most authoritative resources on artificial intelligence, the AI Index has been cited in major media outlets such as The New York Times, Bloomberg, and The Guardian; referenced in hundreds of academic papers; and used by policymakers and government agencies around the world.
CLApr 24
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding TasksLongju Bai, Zhemin Huang, Xingyao Wang et al.
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models' ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost; (2) token usage is highly variable and inherently stochastic: runs on the same task can differ by up to 30x in total tokens, and higher token usage does not translate into higher accuracy; instead, accuracy often peaks at intermediate cost and saturates at higher costs; (3) models vary substantially in token efficiency: on the same tasks, Kimi-K2 and Claude-Sonnet-4.5, on average, consume over 1.5 million more tokens than GPT-5; (4) task difficulty rated by human experts only weakly aligns with actual token costs, revealing a fundamental gap between human-perceived complexity and the computational effort agents actually expend; and (5) frontier models fail to accurately predict their own token usage (with weak-to-moderate correlations, up to 0.39) and systematically underestimate real token costs. Our study offers new insights into the economics of AI agents and can inspire future research in this direction.
AIOct 21, 2025
A Definition of AGIDan Hendrycks, Dawn Song, Christian Szegedy et al.
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly "jagged" cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.
GNJan 11, 2022
The Turing Trap: The Promise & Peril of Human-Like Artificial IntelligenceErik Brynjolfsson
In 1950, Alan Turing proposed an imitation game as the ultimate test of whether a machine was intelligent: could a machine imitate a human so well that its answers to questions indistinguishable from a human. Ever since, creating intelligence that matches human intelligence has implicitly or explicitly been the goal of thousands of researchers, engineers, and entrepreneurs. The benefits of human-like artificial intelligence (HLAI) include soaring productivity, increased leisure, and perhaps most profoundly, a better understanding of our own minds. But not all types of AI are human-like. In fact, many of the most powerful systems are very different from humans. So an excessive focus on developing and deploying HLAI can lead us into a trap. As machines become better substitutes for human labor, workers lose economic and political bargaining power and become increasingly dependent on those who control the technology. In contrast, when AI is focused on augmenting humans rather than mimicking them, then humans retain the power to insist on a share of the value created. Furthermore, augmentation creates new capabilities and new products and services, ultimately generating far more value than merely human-like AI. While both types of AI can be enormously beneficial, there are currently excess incentives for automation rather than augmentation among technologists, business executives, and policymakers.
LGAug 16, 2021
On the Opportunities and Risks of Foundation ModelsRishi Bommasani, Drew A. Hudson, Ehsan Adeli et al.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
AIMar 9, 2021
The AI Index 2021 Annual ReportDaniel Zhang, Saurabh Mishra, Erik Brynjolfsson et al.
Welcome to the fourth edition of the AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI. The report aims to be the most credible and authoritative source for data and insights about AI in the world.
CYJan 28, 2020
Learning Occupational Task-Shares Dynamics for the Future of WorkSubhro Das, Sebastian Steffen, Wyatt Clarke et al.
The recent wave of AI and automation has been argued to differ from previous General Purpose Technologies (GPTs), in that it may lead to rapid change in occupations' underlying task requirements and persistent technological unemployment. In this paper, we apply a novel methodology of dynamic task shares to a large dataset of online job postings to explore how exactly occupational task demands have changed over the past decade of AI innovation, especially across high, mid and low wage occupations. Notably, big data and AI have risen significantly among high wage occupations since 2012 and 2016, respectively. We built an ARIMA model to predict future occupational task demands and showcase several relevant examples in Healthcare, Administration, and IT. Such task demands predictions across occupations will play a pivotal role in retraining the workforce of the future.