Andrew Anderson

HC
h-index2
19papers
533citations
Novelty44%
AI Score41

19 Papers

HCAug 23, 2024
How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga et al.

Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second is a re-analysis of a prior study on a size-4 action space. Other researchers adopting our operationalization of the prediction task and analysis methodology will improve the rigor of user studies conducted with that task, which is particularly important when the domain features a large output space.

CVMay 4, 2022
Domino Saliency Metrics: Improving Existing Channel Saliency Metrics with Structural Information

Kaveena Persand, Andrew Anderson, David Gregg

Channel pruning is used to reduce the number of weights in a Convolutional Neural Network (CNN). Channel pruning removes slices of the weight tensor so that the convolution layer remains dense. The removal of these weight slices from a single layer causes mismatching number of feature maps between layers of the network. A simple solution is to force the number of feature map between layers to match through the removal of weight slices from subsequent layers. This additional constraint becomes more apparent in DNNs with branches where multiple channels need to be pruned together to keep the network dense. Popular pruning saliency metrics do not factor in the structural dependencies that arise in DNNs with branches. We propose Domino metrics (built on existing channel saliency metrics) to reflect these structural constraints. We test Domino saliency metrics against the baseline channel saliency metrics on multiple networks with branches. Domino saliency metrics improved pruning rates in most tested networks and up to 25% in AlexNet on CIFAR-10.

HCDec 12, 2025
From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews

Brenda Nogueira, Werner Geyer, Andrew Anderson et al.

Large Language Models (LLMs) are increasingly embedded in academic writing practices. Although numerous studies have explored how researchers employ these tools for scientific writing, their concrete implementation, limitations, and design challenges within the literature review process remain underexplored. In this paper, we report a user study with researchers across multiple disciplines to characterize current practices, benefits, and \textit{pain points} in using LLMs to investigate related work. We identified three recurring gaps: (i) lack of trust in outputs, (ii) persistent verification burden, and (iii) requiring multiple tools. This motivates our proposal of six design goals and a high-level framework that operationalizes them through improved related papers visualization, verification at every step, and human-feedback alignment with generation-guided explanations. Overall, by grounding our work in the practical, day-to-day needs of researchers, we designed a framework that addresses these limitations and models real-world LLM-assisted writing, advancing trust through verifiable actions and fostering practical collaboration between researchers and AI systems.

HCOct 21, 2025
"Over-the-Hood" AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed Them

Andrew Anderson, Fatima A. Moussaoui, Jimena Noa Guevara et al.

While much research has shown the presence of AI's "under-the-hood" biases (e.g., algorithmic, training data, etc.), what about "over-the-hood" inclusivity biases: barriers in user-facing AI products that disproportionately exclude users with certain problem-solving approaches? Recent research has begun to report the existence of such biases -- but what do they look like, how prevalent are they, and how can developers find and fix them? To find out, we conducted a field study with 3 AI product teams, to investigate what kinds of AI inclusivity bugs exist uniquely in user-facing AI products, and whether/how AI product teams might harness an existing (non-AI-oriented) inclusive design method to find and fix them. The teams' work resulted in identifying 6 types of AI inclusivity bugs arising 83 times, fixes covering 47 of these bug instances, and a new variation of the GenderMag inclusive design method, GenderMag-for-AI, that is especially effective at detecting certain kinds of AI inclusivity bugs.

CVJan 25, 2022
Winograd Convolution for Deep Neural Networks: Efficient Point Selection

Syed Asad Alam, Andrew Anderson, Barbara Barabasz et al.

Convolutional neural networks (CNNs) have dramatically improved the accuracy of tasks such as object recognition, image segmentation and interactive speech systems. CNNs require large amounts of computing resources because ofcomputationally intensive convolution layers. Fast convolution algorithms such as Winograd convolution can greatly reduce the computational cost of these layers at a cost of poor numeric properties, such that greater savings in computation exponentially increase floating point errors. A defining feature of each Winograd convolution algorithm is a set of real-value points where polynomials are sampled. The choice of points impacts the numeric accuracy of the algorithm, but the optimal set of points for small convolutions remains unknown. Existing work considers only small integers and simple fractions as candidate points. In this work, we propose a novel approach to point selection using points of the form {-1/c , -c, c, 1/c } using the full range of real-valued numbers for c. We show that groups of this form cause cancellations in the Winograd transform matrices that reduce numeric error. We find empirically that the error for different values of c forms a rough curve across the range of real-value numbers helping to localize the values of c that reduce error and that lower errors can be achieved with non-obvious real-valued evaluation points instead of integers or simple fractions. We study a range of sizes for small convolutions and achieve reduction in error ranging from 2% to around 59% for both 1D and 2D convolution. Furthermore, we identify patterns in cases when we select a subset of our proposed points which will always lead to a lower error. Finally we implement a complete Winograd convolution layer and use it to run deep convolution neural networks on real datasets and show that our proposed points reduce error, ranging from 22% to 63%.

HCAug 2, 2021
Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles

Andrew Anderson, Jimena Noa Guevara, Fatima Moussaoui et al.

Motivations: Recent research has emerged on generally how to improve AI product user experiences, but relatively little is known about an AI product's inclusivity. For example, what kinds of users does it support well, and who does it leave out? And what changes in the product would make it more inclusive? Objectives: Our overall objective is to help fill this gap, investigating what kinds of diverse users an AI product leaves out, and how to act upon that knowledge. To bring actionability to our findings, we focus on users' diversity of problem-solving attributes. Thus, our specific objectives were: (1) to reveal whether participants with diverse problem-solving styles were left behind in a set of AI products; and (2) to relate participants' problem-solving diversity to their demographic diversity, specifically, gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants. Each experiment compared two AI products: one deliberately violating an HAI guideline and the other applying the guideline. For our first objective, we analyzed how much each AI product gained/lost inclusivity compared to its counterpart, where inclusivity was supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants' problem-solving styles aligned with their demographics, namely their genders and ages. Results & Implications: Participants' diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline-but the "who" that got most of the inclusivity varied widely by guideline and by problem-solving style...

LGMay 21, 2020
TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Yuan Wen, Andrew Anderson, Valentin Radu et al.

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate workspace that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNet and SqueezeNet) on the ARM Cortex-A15 yields speedups of 8x compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2x while sacrificing only 15% of inference time compared to a solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.

CVApr 3, 2020
Composition of Saliency Metrics for Channel Pruning with a Myopic Oracle

Kaveena Persand, Andrew Anderson, David Gregg

The computation and memory needed for Convolutional Neural Network (CNN) inference can be reduced by pruning weights from the trained network. Pruning is guided by a pruning saliency, which heuristically approximates the change in the loss function associated with the removal of specific weights. Many pruning signals have been proposed, but the performance of each heuristic depends on the particular trained network. This leaves the data scientist with a difficult choice. When using any one saliency metric for the entire pruning process, we run the risk of the metric assumptions being invalidated, leading to poor decisions being made by the metric. Ideally we could combine the best aspects of different saliency metrics. However, despite an extensive literature review, we are unable to find any prior work on composing different saliency metrics. The chief difficulty lies in combining the numerical output of different saliency metrics, which are not directly comparable. We propose a method to compose several primitive pruning saliencies, to exploit the cases where each saliency measure does well. Our experiments show that the composition of saliencies avoids many poor pruning choices identified by individual saliencies. In most cases our method finds better selections than even the best individual pruning saliency.

LGJan 9, 2020
Performance-Oriented Neural Architecture Search

Andrew Anderson, Jing Su, Rozenn Dahyot et al.

Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate $0.88\%$ increase in TOP-1 accuracy with $1.85\times$ reduction in latency for keyword spotting in audio on an embedded SoC, and $1.59\times$ on a high-end GPU.

LGJun 11, 2019
Taxonomy of Saliency Metrics for Channel Pruning

Kaveena Persand, Andrew Anderson, David Gregg

Pruning unimportant parameters can allow deep neural networks (DNNs) to reduce their heavy computation and memory requirements. A saliency metric estimates which parameters can be safely pruned with little impact on the classification performance of the DNN. Many saliency metrics have been proposed, each within the context of a wider pruning algorithm. The result is that it is difficult to separate the effectiveness of the saliency metric from the wider pruning algorithm that surrounds it. Similar-looking saliency metrics can yield very different results because of apparently minor design choices. We propose a taxonomy of saliency metrics based on four mostly-orthogonal principal components. We show that a broad range of metrics from the pruning literature can be grouped according to these components. Our taxonomy not only serves as a guide to prior work, but allows us to construct new saliency metrics by exploring novel combinations of our taxonomic components. We perform an in-depth experimental investigation of more than 300 saliency metrics. Our results provide decisive answers to open research questions, and demonstrate the importance of reduction and scaling when pruning groups of weights. We find that some of our constructed metrics can outperform the best existing state-of-the-art metrics for convolutional neural network channel pruning.

HCMar 22, 2019
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

Andrew Anderson, Jonathan Dodge, Amrita Sadarangani et al.

We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants' mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

LGJan 15, 2019
Bonseyes AI Pipeline -- bringing AI to you. End-to-end integration of data, algorithms and deployment tools

Miguel de Prado, Jing Su, Rabia Saeed et al.

Next generation of embedded Information and Communication Technology (ICT) systems are collaborative systems able to perform autonomous tasks. The remarkable expansion of the embedded ICT market, together with the rise and breakthroughs of Artificial Intelligence (AI), have put the focus on the Edge as it stands as one of the keys for the next technological revolution: the seamless integration of AI in our daily life. However, training and deployment of custom AI solutions on embedded devices require a fine-grained integration of data, algorithms, and tools to achieve high accuracy. Such integration requires a high level of expertise that becomes a real bottleneck for small and medium enterprises wanting to deploy AI solutions on the Edge which, ultimately, slows down the adoption of AI on daily-life applications. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms, and deployment tools together. By removing the integration barriers and lowering the required expertise, we can interconnect the different stages of tools and provide a modular end-to-end development of AI products for embedded devices. Our AI pipeline consists of four modular main steps: i) data ingestion, ii) model training, iii) deployment optimization and, iv) the IoT hub integration. To show the effectiveness of our pipeline, we provide examples of different AI applications during each of the steps. Besides, we integrate our deployment framework, LPDNN, into the AI pipeline and present its lightweight architecture and deployment capabilities for embedded devices. Finally, we demonstrate the results of the AI pipeline by showing the deployment of several AI applications such as keyword spotting, image classification and object detection on a set of well-known embedded platforms, where LPDNN consistently outperforms all other popular deployment frameworks.

PFSep 27, 2018
Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Andrew Anderson, David Gregg

Quantization of weights and activations in Deep Neural Networks (DNNs) is a powerful technique for network compression, and has enjoyed significant attention and success. However, much of the inference-time benefit of quantization is accessible only through the use of customized hardware accelerators or by providing an FPGA implementation of quantized arithmetic. Building on prior work, we show how to construct arbitrary bit-precise signed and unsigned integer operations using a software technique which logically \emph{embeds} a vector architecture with custom bit-width lanes in universally available fixed-width scalar arithmetic. We evaluate our approach on a high-end Intel Haswell processor, and an embedded ARM processor. Our approach yields very fast implementations of bit-precise custom DNN operations, which often match or exceed the performance of operations quantized to the sizes supported in native arithmetic. At the strongest level of quantization, our approach yields a maximum speedup of $\thicksim6\times$ on the Intel platform, and $\thicksim10\times$ on the ARM platform versus quantization to native 8-bit integers.

NAMar 29, 2018
Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks

Barbara Barabasz, Andrew Anderson, Kirk M. Soodhalter et al.

Popular deep neural networks (DNNs) spend the majority of their execution time computing convolutions. The Winograd family of algorithms can greatly reduce the number of arithmetic operations required and is present in many DNN software frameworks. However, the performance gain is at the expense of a reduction in floating point (FP) numerical accuracy. In this paper, we analyse the worst case FP error and prove the estimation of norm and conditioning of the algorithm. We show that the bound grows exponentially with the size of the convolution, but the error bound of the \textit{modified} algorithm is smaller than the original one. We propose several methods for reducing FP error. We propose a canonical evaluation ordering based on Huffman coding that reduces summation error. We study the selection of sampling "points" experimentally and find empirically good points for the most important sizes. We identify the main factors associated with good points. In addition, we explore other methods to reduce FP error, including mixed-precision convolution, and pairwise summation across DNN channels. Using our methods we can significantly reduce FP error for a given block size, which allows larger block sizes and reduced computation.

HCNov 21, 2017
Toward Foraging for Understanding of StarCraft Agents: An Empirical Study

Sean Penney, Jonathan Dodge, Claudia Hilderbrand et al.

Assessing and understanding intelligent agents is a difficult task for users that lack an AI background. A relatively new area, called "Explainable AI," is emerging to help address this problem, but little is known about how users would forage through information an explanation system might offer. To inform the development of Explainable AI systems, we conducted a formative study, using the lens of Information Foraging Theory, into how experienced users foraged in the domain of StarCraft to assess an agent. Our results showed that participants faced difficult foraging problems. These foraging problems caused participants to entirely miss events that were important to them, reluctantly choose to ignore actions they did not want to ignore, and bear high cognitive, navigation, and information costs to access the information they needed.

HCNov 19, 2017
How the Experts Do It: Assessing and Explaining Agent Behaviors in Real-Time Strategy Games

Jonathan Dodge, Sean Penney, Claudia Hilderbrand et al.

How should an AI-based explanation system explain an agent's complex behavior to ordinary end users who have no background in AI? Answering this question is an active research area, for if an AI-based explanation system could effectively explain intelligent agents' behavior, it could enable the end users to understand, assess, and appropriately trust (or distrust) the agents attempting to help them. To provide insights into this question, we turned to human expert explainers in the real-time strategy domain, "shoutcaster", to understand (1) how they foraged in an evolving strategy game in real time, (2) how they assessed the players' behaviors, and (3) how they constructed pertinent and timely explanations out of their insights and delivered them to their audience. The results provided insights into shoutcasters' foraging strategies for gleaning information necessary to assess and explain the players; a characterization of the types of implicit questions shoutcasters answered; and implications for creating explanations by using the patterns

PFOct 3, 2017
Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming

Andrew Anderson, David Gregg

Deep Neural Networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. Many different algorithms have been proposed to implement the most computationally expensive layers of DNNs. Further, each of these algorithms has a large number of variants, which offer different trade-offs of parallelism, data locality, memory footprint, and execution time. In addition, specific algorithms operate much more efficiently on specialized data layouts and formats. We state the problem of optimal primitive selection in the presence of data format transformations, and show that it is NP-hard by demonstrating an embedding in the Partitioned Boolean Quadratic Assignment problem (PBQP). We propose an analytic solution via a PBQP solver, and evaluate our approach experimentally by optimizing several popular DNNs using a library of more than 70 DNN primitives, on an embedded platform and a general purpose platform. We show experimentally that significant gains are possible versus the state of the art vendor libraries by using a principled analytic solution to the problem of layout selection in the presence of data format transformations.

CVSep 8, 2017
Low-memory GEMM-based convolution algorithms for deep neural networks

Andrew Anderson, Aravind Vasudevan, Cormac Keane et al.

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$, requires $O(K^2CHW)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O(KCHW)$ auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just $O(MHW)$ and $O(KW)$ additional space respectively, where $M$ is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.

CVApr 6, 2017
Parallel Multi Channel Convolution using General Matrix Multiplication

Aravind Vasudevan, Andrew Anderson, David Gregg

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.