Colton R. Crum

CV
h-index35
7papers
23citations
Novelty38%
AI Score31

7 Papers

CVJun 8, 2023
Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training

Colton R. Crum, Aidan Boyd, Kevin Bowyer et al.

Machine learning models have shown increased accuracy in classification tasks when the training process incorporates human perceptual information. However, a challenge in training human-guided models is the cost associated with collecting image annotations for human salience. Collecting annotation data for all images in a large training set can be prohibitively expensive. In this work, we utilize "teacher" models (trained on a small amount of human-annotated data) to annotate additional data by means of teacher models' saliency maps. Then, "student" models are trained using the larger amount of annotated training data. This approach makes it possible to supplement a limited number of human-supplied annotations with an arbitrarily large number of model-generated image annotations. We compare the accuracy achieved by our teacher-student training paradigm with (1) training using all available human salience annotations, and (2) using all available training data without human salience annotations. We use synthetic face detection and fake iris detection as example challenging problems, and report results across four model architectures (DenseNet, ResNet, Xception, and Inception), and two saliency estimation methods (CAM and RISE). Results show that our teacher-student training paradigm results in models that significantly exceed the performance of both baselines, demonstrating that our approach can usefully leverage a small amount of human annotations to generate salience maps for an arbitrary amount of additional training data.

CVOct 30, 2023
MENTOR: Human Perception-Guided Pretraining for Increased Generalization

Colton R. Crum, Adam Czajka

Leveraging human perception into training of convolutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks. One of the active research questions is where (in the model architecture or training pipeline) and how to efficiently incorporate always limited human perceptual data into training strategies of models. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization), which addresses this question through two unique rounds of training CNNs tasked with open-set anomaly detection. First, we train an autoencoder to learn human saliency maps given an input image, without any class labels. The autoencoder is thus tasked with discovering domain-specific salient features which mimic human perception. Second, we remove the decoder part, add a classification layer on top of the encoder, and train this new model conventionally, now using class labels. We show that MENTOR successfully raises the generalization performance across three different CNN backbones in a variety of anomaly detection tasks (demonstrated for detection of unknown iris presentation attacks, synthetically-generated faces, and anomalies in chest X-ray images) compared to traditional pretraining methods (e.g., sourcing the weights from ImageNet), and as well as state-of-the-art methods that incorporate human perception guidance into training. In addition, we demonstrate that MENTOR can be flexibly applied to existing human perception-guided methods and subsequently increasing their generalization with no architectural modifications.

AIFeb 13, 2024
Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence

Cary Coglianese, Colton R. Crum

Fervent calls for more robust governance of the harms associated with artificial intelligence (AI) are leading to the adoption around the world of what regulatory scholars have called a management-based approach to regulation. Recent initiatives in the United States and Europe, as well as the adoption of major self-regulatory standards by the International Organization for Standardization, share in common a core management-based paradigm. These management-based initiatives seek to motivate an increase in human oversight of how AI tools are trained and developed. Refinements and systematization of human-guided training techniques will thus be needed to fit within this emerging era of management-based regulatory paradigm. If taken seriously, human-guided training can alleviate some of the technical and ethical pressures on AI, boosting AI performance with human intuition as well as better addressing the needs for fairness and effective explainability. In this paper, we discuss the connection between the emerging management-based regulatory frameworks governing AI and the need for human oversight during training. We broadly cover some of the technical components involved in human-guided training and then argue that the kinds of high-stakes use cases for AI that appear of most concern to regulators should lean more on human-guided training than on data-only training. We hope to foster a discussion between legal scholars and computer scientists involving how to govern a domain of technology that is vast, heterogenous, and dynamic in its applications and risks.

CVMay 1, 2024
Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models

Colton R. Crum, Samuel Webster, Adam Czajka

Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.

CYJan 26, 2025
Regulating Multifunctionality

Cary Coglianese, Colton R. Crum

Foundation models and generative artificial intelligence (AI) exacerbate a core regulatory challenge associated with AI: its heterogeneity. By their very nature, foundation models and generative AI can perform multiple functions for their users, thus presenting a vast array of different risks. This multifunctionality means that prescriptive, one-size-fits-all regulation will not be a viable option. Even performance standards and ex post liability - regulatory approaches that usually afford flexibility - are unlikely to be strong candidates for responding to multifunctional AI's risks, given challenges in monitoring and enforcement. Regulators will do well instead to promote proactive risk management on the part of developers and users by using management-based regulation, an approach that has proven effective in other contexts of heterogeneity. Regulators will also need to maintain ongoing vigilance and agility. More than in other contexts, regulators of multifunctional AI will need sufficient resources, top human talent and leadership, and organizational cultures committed to regulatory excellence.

CVNov 16, 2025
SAGE: Saliency-Guided Contrastive Embeddings

Colton R. Crum, Adam Czajka

Integrating human perceptual priors into the training of neural networks has been shown to raise model generalization, serve as an effective regularizer, and align models with human expertise for applications in high-risk domains. Existing approaches to integrate saliency into model training often rely on internal model mechanisms, which recent research suggests may be unreliable. Our insight is that many challenges associated with saliency-guided training stem from the placement of the guidance approaches solely within the image space. Instead, we move away from the image space, use the model's latent space embeddings to steer human guidance during training, and we propose SAGE (Saliency-Guided Contrastive Embeddings): a loss function that integrates human saliency into network training using contrastive embeddings. We apply salient-preserving and saliency-degrading signal augmentations to the input and capture the changes in embeddings and model logits. We guide the model towards salient features and away from non-salient features using a contrastive triplet loss. Additionally, we perform a sanity check on the logit distributions to ensure that the model outputs match the saliency-based augmentations. We demonstrate a boost in classification performance across both open- and closed-set scenarios against SOTA saliency-based methods, showing SAGE's effectiveness across various backbones, and include experiments to suggest its wide generalization across tasks.

CVApr 23, 2025
Almost Right: Making First-Layer Kernels Nearly Orthogonal Improves Model Generalization

Colton R. Crum, Adam Czajka

Despite several algorithmic advances in the training of convolutional neural networks (CNNs) over the years, their generalization capabilities are still subpar across several pertinent domains, particularly within open-set tasks often found in biometric and medical contexts. On the contrary, humans have an uncanny ability to generalize to unknown visual stimuli. The efficient coding hypothesis posits that early visual structures (retina, Lateral Geniculate Nucleus, and primary visual cortex) transform inputs to reduce redundancy and maximize information efficiency. This mechanism of redundancy minimization in early vision was the inspiration for CNN regularization techniques that force convolutional kernels to be orthogonal. However, the existing works rely upon matrix projections, architectural modifications, or specific weight initializations, which frequently overtly constrain the network's learning process and excessively increase the computational load during loss function calculation. In this paper, we introduce a flexible and lightweight approach that regularizes a subset of first-layer convolutional filters by making them pairwise-orthogonal, which reduces the redundancy of the extracted features but at the same time prevents putting excessive constraints on the network. We evaluate the proposed method on three open-set visual tasks (anomaly detection in chest X-ray images, synthetic face detection, and iris presentation attack detection) and observe an increase in the generalization capabilities of models trained with the proposed regularizer compared to state-of-the-art kernel orthogonalization approaches. We offer source codes along with the paper.