SEFeb 16
GenAI for Systems: Recurring Challenges and Design Principles from Software to SiliconArya Tschand, Chenyu Wang, Zishen Wan et al. · harvard
Generative AI is reshaping how computing systems are designed, optimized, and built, yet research remains fragmented across software, architecture, and chip design communities. This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification. Rather than reviewing each layer in isolation, we analyze how the same structural difficulties and effective responses recur across the stack. Our central finding is one of convergence. Despite the diversity of domains and tools, the field keeps encountering five recurring challenges (the feedback loop crisis, the tacit knowledge problem, trust and validation, co-design across boundaries, and the shift from determinism to dynamism) and keeps arriving at five design principles that independently emerge as effective responses (embracing hybrid approaches, designing for continuous feedback, separating concerns by role, matching methods to problem structure, and building on decades of systems knowledge). We organize these into a challenge--principle map that serves as a diagnostic and design aid, showing which principles have proven effective for which challenges across layers. Through concrete cross-stack examples, we show how systems navigate this map as they mature, and argue that the field needs shared engineering methodology, including common vocabularies, cross-layer benchmarks, and systematic design practices, so that progress compounds across communities rather than being rediscovered in each one. Our analysis covers more than 275 papers spanning eleven application areas across three layers of the computing stack, and distills open research questions that become visible only from a cross-layer vantage point.
THApr 10, 2025
Private Private InformationKevin He, Fedor Sandomirskiy, Omer Tamuz
Private signals model noisy information about an unknown state. Although these signals are called "private," they may still carry information about each other. Our paper introduces the concept of private private signals, which contain information about the state but not about other signals. To achieve privacy, signal quality may need to be sacrificed. We study the informativeness of private private signals and characterize those that are optimal in the sense that they cannot be made more informative without violating privacy. We discuss implications for privacy in recommendation systems, information design, causal inference, and mechanism design.
GNMay 1, 2020
Network Structure and Naive Sequential LearningKrishna Dasaratha, Kevin He
We study a sequential-learning model featuring a network of naive agents with Gaussian information structures. Agents apply a heuristic rule to aggregate predecessors' actions. They weigh these actions according the strengths of their social connections to different predecessors. We show this rule arises endogenously when agents wrongly believe others act solely on private information and thus neglect redundancies among observations. We provide a simple linear formula expressing agents' actions in terms of network paths and use this formula to characterize the set of networks where naive agents eventually learn correctly. This characterization implies that, on all networks where later agents observe more than one neighbor, there exist disproportionately influential early agents who can cause herding on incorrect actions. Going beyond existing social-learning results, we compute the probability of such mislearning exactly. This allows us to compare likelihoods of incorrect herding, and hence expected welfare losses, across network structures. The probability of mislearning increases when link densities are higher and when networks are more integrated. In partially segregated networks, divergent early signals can lead to persistent disagreement between groups.
GNAug 19, 2021
Mislearning from Censored Data: The Gambler's Fallacy and Other Correlational Mistakes in Optimal-Stopping ProblemsKevin He
I study endogenous learning dynamics for people who misperceive intertemporal correlations in random sequences. Biased agents face an optimal-stopping problem. They are uncertain about the underlying distribution and learn its parameters from predecessors. Agents stop when early draws are "good enough," so predecessors' experiences contain negative streaks but not positive streaks. When agents wrongly expect systematic reversals (the "gambler's fallacy"), they understate the likelihood of consecutive below-average draws, converge to over-pessimistic beliefs about the distribution's mean, and stop too early. Agents uncertain about the distribution's variance overestimate it to an extent that depends on predecessors' stopping thresholds. I also analyze how other misperceptions of intertemporal correlation interact with endogenous data censoring.
MLAug 10, 2022
KL-divergence Based Deep Learning for Discrete Time ModelLi Liu, Xiangeng Fang, Di Wang et al.
Neural Network (Deep Learning) is a modern model in Artificial Intelligence and it has been exploited in Survival Analysis. Although several improvements have been shown by previous works, training an excellent deep learning model requires a huge amount of data, which may not hold in practice. To address this challenge, we develop a Kullback-Leibler-based (KL) deep learning procedure to integrate external survival prediction models with newly collected time-to-event data. Time-dependent KL discrimination information is utilized to measure the discrepancy between the external and internal data. This is the first work considering using prior information to deal with short data problem in Survival Analysis for deep learning. Simulation and real data results show that the proposed model achieves better performance and higher robustness compared with previous works.
HCDec 12, 2025
AI as a Teaching Partner: Early Lessons from Classroom Codesign with Secondary TeachersAlex Liu, Lief Esbenshade, Shawon Sarkar et al. · uw
This report presents a comprehensive account of the Colleague AI Classroom pilot, a collaborative design (co-design) study that brought generative AI technology directly into real classrooms. In this study, AI functioned as a third agent, an active participant that mediated feedback, supported inquiry, and extended teachers' instructional reach while preserving human judgment and teacher authority. Over seven weeks in spring 2025, 21 in-service teachers from four Washington State public school districts and one independent school integrated four AI-powered features of the Colleague AI Classroom into their instruction: Teaching Aide, Assessment and AI Grading, AI Tutor, and Student Growth Insights. More than 600 students in grades 6-12 used the platform in class at the direction of their teachers, who designed and facilitated the AI activities. During the Classroom pilot, teachers were co-design partners: they planned activities, implemented them with students, and provided weekly reflections on AI's role in classroom settings. The teachers' feedback guided iterative improvements for Colleague AI. The research team captured rich data through surveys, planning and reflection forms, group meetings, one-on-one interviews, and platform usage logs to understand where AI adds instructional value and where it requires refinement.
56.2HCApr 17
Teacher-Authored Prompts for Configuring Student-AI Dialogue: K-12 Classroom ImplementationAlex Liu, Min Sun, Lief Esbenshade et al.
GenAI has rapidly entered instructional and learning settings as a teaching assistant or AI tutor. However, less is known about how pedagogical intent connects to the learning generated within these systems, especially when student-facing AI dialogues are fine-tuned through teacher orchestration in live classrooms. This study examines a classroom deployment of a "Classroom Teaching Aide" (TASD) system, which enables teachers to author both a teacher-to-AI setup prompt (instructional scaffold) and a student-facing conversation starter to launch AI-mediated classroom discussions. We analyze a multi-subject pilot conducted in Spring 2025, involving 20 participating teachers (16 of whom implemented the system), across 39 classrooms and 77 TASD settings, yielding 1,479 student-AI conversations with 878 unique students. Using platform logs, LLM coding with human validation, and post-study teacher interviews (N=10), we characterize teacher authoring choices and link them to enacted student-AI interaction outcomes. In deployment, student-AI conversations were largely aligned with instructional intent: 71% were fully on-track, and fewer than 1% were substantially off-track. However, a persistent design-enactment gap emerged for cognitive demand: 38% of conversations under-reached the teacher-targeted DOK level, approaching 50% when targeting DOK 3. The study also shows that explicit finish lines in the prompt reduced the DOK gap by 0.22 levels (p < .001), and "no direct answers" guardrails reduced AI final-answer rates by 8.5 percentage points. These findings position teacher-authored prompt layers as critical orchestration levers that translate pedagogical intent into structured student-AI dialogue, underscoring both their promise for scalable classroom integration and the need for additional supports to reliably sustain higher-order reasoning during enactment.
LGJan 15, 2025
Training-Aware Risk Control for Intensity Modulated Radiation Therapies Quality Assurance with Conformal PredictionKevin He, David Adam, Sarah Han-Oh et al.
Measurement quality assurance (QA) practices play a key role in the safe use of Intensity Modulated Radiation Therapies (IMRT) for cancer treatment. These practices have reduced measurement-based IMRT QA failure below 1%. However, these practices are time and labor intensive which can lead to delays in patient care. In this study, we examine how conformal prediction methodologies can be used to robustly triage plans. We propose a new training-aware conformal risk control method by combining the benefit of conformal risk control and conformal training. We incorporate the decision making thresholds based on the gamma passing rate, along with the risk functions used in clinical evaluation, into the design of the risk control framework. Our method achieves high sensitivity and specificity and significantly reduces the number of plans needing measurement without generating a huge confidence interval. Our results demonstrate the validity and applicability of conformal prediction methods for improving efficiency and reducing the workload of the IMRT QA process.
61.7CYApr 8
Generative AI in K-12 Classrooms: A Midyear Implementation ReportLief Esbenshade, Alex Liu, Michael Xiao et al.
This mid-year report summarizes teacher use of Colleague AI across 12 Washington State school districts from September 1 to December 31, 2025. Produced jointly by Colleague AI and AmplifyLearn.AI at the University of Washington, this report aggregates platform data and district-provided administrative records to provide an early look at how teachers engaged with AI during the first half of the 2025-26 school year. The districts vary in size from small districts with a few thousand students to large districts with up to thirty thousand students. The districts are rural, suburban, and urban. Only a subset of districts were able to provide mid-year administrative data, and findings that link teachers' use of Colleague AI to student characteristics should be interpreted as preliminary signals.
THFeb 20, 2025
Human Misperception of Generative-AI Alignment: A Laboratory ExperimentKevin He, Ran Shorrer, Mengjia Xia
We conduct an incentivized laboratory experiment to study people's perception of generative artificial intelligence (GenAI) alignment in the context of economic decision-making. Using a panel of economic problems spanning the domains of risk, time preference, social preference, and strategic interactions, we ask human subjects to make choices for themselves and to predict the choices made by GenAI on behalf of a human user. We find that people overestimate the degree of alignment between GenAI's choices and human choices. In every problem, human subjects' average prediction about GenAI's choice is substantially closer to the average human-subject choice than it is to the GenAI choice. At the individual level, different subjects' predictions about GenAI's choice in a given problem are highly correlated with their own choices in the same problem. We explore the implications of people overestimating GenAI alignment in a simple theoretical model.
HCJul 23, 2025
Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at ScaleAlex Liu, Lief Esbenshade, Shawon Sarkar et al.
The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.
AIFeb 12, 2025
High-Throughput SAT SamplingArash Ardakani, Minwoo Kang, Kevin He et al.
In this work, we present a novel technique for GPU-accelerated Boolean satisfiability (SAT) sampling. Unlike conventional sampling algorithms that directly operate on conjunctive normal form (CNF), our method transforms the logical constraints of SAT problems by factoring their CNF representations into simplified multi-level, multi-output Boolean functions. It then leverages gradient-based optimization to guide the search for a diverse set of valid solutions. Our method operates directly on the circuit structure of refactored SAT instances, reinterpreting the SAT problem as a supervised multi-output regression task. This differentiable technique enables independent bit-wise operations on each tensor element, allowing parallel execution of learning processes. As a result, we achieve GPU-accelerated sampling with significant runtime improvements ranging from $33.6\times$ to $523.6\times$ over state-of-the-art heuristic samplers. We demonstrate the superior performance of our sampling method through an extensive evaluation on $60$ instances from a public domain benchmark suite utilized in previous studies.
MLMay 17, 2018
Covariance-Insured ScreeningKevin He, Jian Kang, Hyokyoung Grace Hong et al.
Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss these weak signals. By incorporating the inter-feature dependence, we propose a covariance-insured screening methodology to identify predictors that are jointly informative but only marginally weakly associated with outcomes. The validity of the method is examined via extensive simulations and real data studies for selecting potential genetic factors related to the onset of cancer.
MLNov 4, 2016
Classification with Ultrahigh-Dimensional FeaturesYanming Li, Hyokyoung Hong, Jian Kang et al.
Although much progress has been made in classification with high-dimensional features \citep{Fan_Fan:2008, JGuo:2010, CaiSun:2014, PRXu:2014}, classification with ultrahigh-dimensional features, wherein the features much outnumber the sample size, defies most existing work. This paper introduces a novel and computationally feasible multivariate screening and classification method for ultrahigh-dimensional data. Leveraging inter-feature correlations, the proposed method enables detection of marginally weak and sparse signals and recovery of the true informative feature set, and achieves asymptotic optimal misclassification rates. We also show that the proposed procedure provides more powerful discovery boundaries compared to those in \citet{CaiSun:2014} and \citet{JJin:2009}. The performance of the proposed procedure is evaluated using simulation studies and demonstrated via classification of patients with different post-transplantation renal functional types.