LGMay 31, 2022
k-Means Maximum Entropy ExplorationAlexander Nedergaard, Matthew Cook
Exploration in high-dimensional, continuous spaces with sparse rewards is an open problem in reinforcement learning. Artificial curiosity algorithms address this by creating rewards that lead to exploration. Given a reinforcement learning algorithm capable of maximizing rewards, the problem reduces to finding an optimization objective consistent with exploration. Maximum entropy exploration uses the entropy of the state visitation distribution as such an objective. However, efficiently estimating the entropy of the state visitation distribution is challenging in high-dimensional, continuous spaces. We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution. The bound relies on a result we prove for non-parametric density estimation in arbitrary dimensions using k-means. We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces, especially on tasks where reinforcement learning algorithms are unable to find rewards.
AIJan 15
Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing DetectionFrank Bobe, Gregory D. Vetaw, Chase Pavlick et al.
The practice of fine-tuning Large Language Models (LLMs) has achieved state-of-the-art performance on specialized tasks, yet diagnosing why these models become brittle and fail to generalize remains a critical open problem. To address this, we introduce and apply a multi-layered diagnostic framework to a cross-architectural study. We fine-tune Llama 3.1 8B, Gemma 2 9B, and Mistral models on a high-stakes phishing detection task and use SHAP analysis and mechanistic interpretability to uncover the root causes of their generalization failures. Our investigation reveals three critical findings: (1) Generalization is driven by a powerful synergy between architecture and data diversity. The Gemma 2 9B model achieves state-of-the-art performance (>91\% F1), but only when trained on a stylistically diverse ``generalist'' dataset. (2) Generalization is highly architecture-dependent. We diagnose a specific failure mode in Llama 3.1 8B, which performs well on a narrow domain but cannot integrate diverse data, leading to a significant performance drop. (3) Some architectures are inherently more generalizable. The Mistral model proves to be a consistent and resilient performer across multiple training paradigms. By pinpointing the flawed heuristics responsible for these failures, our work provides a concrete methodology for diagnosing and understanding generalization failures, underscoring that reliable AI requires deep validation of the interplay between architecture, data, and training strategy.
LGNov 12, 2023
Omitted Labels Induce Nontransitive Paradoxes in CausalityBijan Mazaheri, Siddharth Jain, Matthew Cook et al.
We explore "omitted label contexts," in which training data is limited to a subset of the possible labels. This setting is standard among specialized human experts or specific, focused studies. By studying Simpson's paradox, we observe that ``correct'' adjustments sometimes require non-exchangeable treatment and control groups. A generalization of Simpson's paradox leads us to study networks of conclusions drawn from different contexts, within which a paradox of nontransitivity arises. We prove that the space of possible nontransitive structures in these networks exactly corresponds to structures that form from aggregating ranked-choice votes.
CVSep 17, 2020Code
Microtubule Tracking in Electron Microscopy VolumesNils Eckstein, Julia Buhmann, Matthew Cook et al.
We present a method for microtubule tracking in electron microscopy volumes. Our method first identifies a sparse set of voxels that likely belong to microtubules. Similar to prior work, we then enumerate potential edges between these voxels, which we represent in a candidate graph. Tracks of microtubules are found by selecting nodes and edges in the candidate graph by solving a constrained optimization problem incorporating biological priors on microtubule structure. For this, we present a novel integer linear programming formulation, which results in speed-ups of three orders of magnitude and an increase of 53% in accuracy compared to prior art (evaluated on three 1.2 x 4 x 4$μ$m volumes of Drosophila neural tissue). We also propose a scheme to solve the optimization problem in a block-wise fashion, which allows distributed tracking and is necessary to process very large electron microscopy volumes. Finally, we release a benchmark dataset for microtubule tracking, here used for training, testing and validation, consisting of eight 30 x 1000 x 1000 voxel blocks (1.2 x 4 x 4$μ$m) of densely annotated microtubules in the CREMI data set (https://github.com/nilsec/micron).
QUANT-PHMar 21, 2024
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-AttentionEthan N. Evans, Matthew Cook, Zachary P. Bradshaw et al.
The recent exploding growth in size of state-of-the-art machine learning models highlights a well-known issue where exponential parameter growth, which has grown to trillions as in the case of the Generative Pre-trained Transformer (GPT), leads to training time and memory requirements which limit their advancement in the near term. The predominant models use the so-called transformer network and have a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. Here we present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart. Our approach leverages recent insights from kernel-based operator learning in the context of predicting spatiotemporal systems to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.
LGJul 2, 2020
Outlier Detection through Null Space Analysis of Neural NetworksMatthew Cook, Alina Zare, Paul Gader
Many machine learning classification systems lack competency awareness. Specifically, many systems lack the ability to identify when outliers (e.g., samples that are distinct from and not represented in the training data distribution) are being presented to the system. The ability to detect outliers is of practical significance since it can help the system behave in an reasonable way when encountering unexpected data. In prior work, outlier detection is commonly carried out in a processing pipeline that is distinct from the classification model. Thus, for a complete system that incorporates outlier detection and classification, two models must be trained, increasing the overall complexity of the approach. In this paper we use the concept of the null space to integrate an outlier detection method directly into a neural network used for classification. Our method, called Null Space Analysis (NuSA) of neural networks, works by computing and controlling the magnitude of the null space projection as data is passed through a network. Using these projections, we can then calculate a score that can differentiate between normal and abnormal data. Results are shown that indicate networks trained with NuSA retain their classification performance while also being able to detect outliers at rates similar to commonly used outlier detection algorithms.
CVFeb 3, 2020
Efficient 2D neuron boundary segmentation with local topological constraintsThanuja D. Ambegoda, Matthew Cook
We present a method for segmenting neuron membranes in 2D electron microscopy imagery. This segmentation task has been a bottleneck to reconstruction efforts of the brain's synaptic circuits. One common problem is the misclassification of blurry membrane fragments as cell interior, which leads to merging of two adjacent neuron sections into one via the blurry membrane region. Human annotators can easily avoid such errors by implicitly performing gap completion, taking into account the continuity of membranes. Drawing inspiration from these human strategies, we formulate the segmentation task as an edge labeling problem on a graph with local topological constraints. We derive an integer linear program (ILP) that enforces membrane continuity, i.e. the absence of gaps. The cost function of the ILP is the pixel-wise deviation of the segmentation from a priori membrane probabilities derived from the data. Based on membrane probability maps obtained using random forest classifiers and convolutional neural networks, our method improves the neuron boundary segmentation accuracy compared to a variety of standard segmentation approaches. Our method successfully performs gap completion and leads to fewer topological errors. The method could potentially also be incorporated into other image segmentation pipelines with known topological constraints.
IVFeb 1, 2020
Estimation of Z-Thickness and XY-Anisotropy of Electron Microscopy Images using Gaussian ProcessesThanuja D. Ambegoda, Julien N. P. Martel, Jozef Adamcik et al.
Serial section electron microscopy (ssEM) is a widely used technique for obtaining volumetric information of biological tissues at nanometer scale. However, accurate 3D reconstructions of identified cellular structures and volumetric quantifications require precise estimates of section thickness and anisotropy (or stretching) along the XY imaging plane. In fact, many image processing algorithms simply assume isotropy within the imaging plane. To ameliorate this problem, we present a method for estimating thickness and stretching of electron microscopy sections using non-parametric Bayesian regression of image statistics. We verify our thickness and stretching estimates using direct measurements obtained by atomic force microscopy (AFM) and show that our method has a lower estimation error compared to a recent indirect thickness estimation method as well as a relative Z coordinate estimation method. Furthermore, we have made the first dataset of ssSEM images with directly measured section thickness values publicly available for the evaluation of indirect thickness estimation methods.
CVOct 14, 2019
FireNet: Real-time Segmentation of Fire Perimeter from Aerial VideoJigar Doshi, Dominic Garcia, Cliff Massey et al.
In this paper, we share our approach to real-time segmentation of fire perimeter from aerial full-motion infrared video. We start by describing the problem from a humanitarian aid and disaster response perspective. Specifically, we explain the importance of the problem, how it is currently resolved, and how our machine learning approach improves it. To test our models we annotate a large-scale dataset of 400,000 frames with guidance from domain experts. Finally, we share our approach currently deployed in production with inference speed of 20 frames per second and an accuracy of 92 (F1 Score).
IVApr 1, 2019
Comparison of Possibilistic Fuzzy Local Information C-Means and Possibilistic K-Nearest Neighbors for Synthetic Aperture Sonar Image SegmentationJoshua Peeples, Matthew Cook, Daniel Suen et al.
Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic Fuzzy Local Information C-Means (PFLICM) algorithm has been previously applied to segment SAS imagery. Additionally, the Possibilistic K-Nearest Neighbors (PKNN) algorithm has been used in other domains such as landmine detection and hyperspectral imagery. In this paper, we compare the segmentation performance of a semi-supervised approach using PFLICM and a supervised method using Possibilistic K-NN. We include final segmentation results on multiple SAS images and a quantitative assessment of each algorithm.
CVJun 21, 2018
Synaptic partner prediction from point annotations in insect brainsJulia Buhmann, Renate Krause, Rodrigo Ceballos Lentini et al.
High-throughput electron microscopy allows recording of lar- ge stacks of neural tissue with sufficient resolution to extract the wiring diagram of the underlying neural network. Current efforts to automate this process focus mainly on the segmentation of neurons. However, in order to recover a wiring diagram, synaptic partners need to be identi- fied as well. This is especially challenging in insect brains like Drosophila melanogaster, where one presynaptic site is associated with multiple post- synaptic elements. Here we propose a 3D U-Net architecture to directly identify pairs of voxels that are pre- and postsynaptic to each other. To that end, we formulate the problem of synaptic partner identification as a classification problem on long-range edges between voxels to encode both the presence of a synaptic pair and its direction. This formulation allows us to directly learn from synaptic point annotations instead of more ex- pensive voxel-based synaptic cleft or vesicle annotations. We evaluate our method on the MICCAI 2016 CREMI challenge and improve over the current state of the art, producing 3% fewer errors than the next best method.
LGApr 30, 2017
Deep Learning in the Automotive Industry: Applications and ToolsAndre Luckow, Matthew Cook, Nathan Ashcraft et al.
Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive uses cases for deep learning in particular in the domain of computer vision. We surveys the current state-of-the-art in libraries, tools and infrastructures (e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural networks. We particularly focus on convolutional neural networks and computer vision use cases, such as the visual inspection process in manufacturing plants and the analysis of social media data. To train neural networks, curated and labeled datasets are essential. In particular, both the availability and scope of such datasets is typically very limited. A main contribution of this paper is the creation of an automotive dataset, that allows us to learn and automatically recognize different vehicle properties. We describe an end-to-end deep learning application utilizing a mobile app for data collection and process support, and an Amazon-based cloud backend for storage and training. For training we evaluate the use of cloud and on-premises infrastructures (including multiple GPUs) in conjunction with different neural network architectures and frameworks. We assess both the training times as well as the accuracy of the classifier. Finally, we demonstrate the effectiveness of the trained classifier in a real world setting during manufacturing process.
NEMar 18, 2017
A wake-sleep algorithm for recurrent, spiking neural networksJohannes Thiele, Peter Diehl, Matthew Cook
We investigate a recently proposed model for cortical computation which performs relational inference. It consists of several interconnected, structurally equivalent populations of leaky integrate-and-fire (LIF) neurons, which are trained in a self-organized fashion with spike-timing dependent plasticity (STDP). Despite its robust learning dynamics, the model is susceptible to a problem typical for recurrent networks which use a correlation based (Hebbian) learning rule: if trained with high learning rates, the recurrent connections can cause strong feedback loops in the network dynamics, which lead to the emergence of attractor states. This causes a strong reduction in the number of representable patterns and a decay in the inference ability of the network. As a solution, we introduce a conceptually very simple "wake-sleep" algorithm: during the wake phase, training is executed normally, while during the sleep phase, the network "dreams" samples from its generative model, which are induced by random input. This process allows us to activate the attractor states in the network, which can then be unlearned effectively by an anti-Hebbian mechanism. The algorithm allows us to increase learning rates up to a factor of ten while avoiding clustering, which allows the network to learn several times faster. Also for low learning rates, where clustering is not an issue, it improves convergence speed and reduces the final inference error.
NEAug 29, 2016
Learning and Inferring Relations in Cortical NetworksPeter U. Diehl, Matthew Cook
A pressing scientific challenge is to understand how brains work. Of particular interest is the neocortex,the part of the brain that is especially large in humans, capable of handling a wide variety of tasks including visual, auditory, language, motor, and abstract processing. These functionalities are processed in different self-organized regions of the neocortical sheet, and yet the anatomical structure carrying out the processing is relatively uniform across the sheet. We are at a loss to explain, simulate, or understand such a multi-functional homogeneous sheet-like computational structure - we do not have computational models which work in this way. Here we present an important step towards developing such models: we show how uniform modules of excitatory and inhibitory neurons can be connected bidirectionally in a network that, when exposed to input in the form of population codes, learns the input encodings as well as the relationships between the inputs. STDP learning rules lead the modules to self-organize into a relational network, which is able to infer missing inputs,restore noisy signals, decide between conflicting inputs, and combine cues to improve estimates. These networks show that it is possible for a homogeneous network of spiking units to self-organize so as to provide meaningful processing of its inputs. If such networks can be scaled up, they could provide an initial computational model relevant to the large scale anatomy of the neocortex.
CVMar 19, 2016
Adaptive coherence estimator (ACE) for explosive hazard detection using wideband electromagnetic induction (WEMI)Brendan Alvey, Alina Zare, Matthew Cook et al.
The adaptive coherence estimator (ACE) estimates the squared cosine of the angle between a known target vector and a sample vector in a whitened coordinate space. The space is whitened according to an estimation of the background statistics, which directly effects the performance of the statistic as a target detector. In this paper, the ACE detection statistic is used to detect buried explosive hazards with data from a Wideband Electromagnetic Induction (WEMI) sensor. Target signatures are based on a dictionary defined using a Discrete Spectrum of Relaxation Frequencies (DSRF) model. Results are summarized as a receiver operator curve (ROC) and compared to other leading methods.
CVMar 19, 2016
Buried object detection using handheld WEMI with task-driven extended functions of multiple instancesMatthew Cook, Alina Zare, Dominic Ho
Many effective supervised discriminative dictionary learning methods have been developed in the literature. However, when training these algorithms, precise ground-truth of the training data is required to provide very accurate point-wise labels. Yet, in many applications, accurate labels are not always feasible. This is especially true in the case of buried object detection in which the size of the objects are not consistent. In this paper, a new multiple instance dictionary learning algorithm for detecting buried objects using a handheld WEMI sensor is detailed. The new algorithm, Task Driven Extended Functions of Multiple Instances, can overcome data that does not have very precise point-wise labels and still learn a highly discriminative dictionary. Results are presented and discussed on measured WEMI data.
CVMar 8, 2015
TED: A Tolerant Edit Distance for Segmentation EvaluationJan Funke, Francesc Moreno-Noguer, Albert Cardona et al.
In this paper, we present a novel error measure to compare a segmentation against ground truth. This measure, which we call Tolerant Edit Distance (TED), is motivated by two observations: (1) Some errors, like small boundary shifts, are tolerable in practice. Which errors are tolerable is application dependent and should be a parameter of the measure. (2) Non-tolerable errors have to be corrected manually. The time needed to do so should be reflected by the error measure. Using integer linear programming, the TED finds the minimal weighted sum of split and merge errors exceeding a given tolerance criterion, and thus provides a time-to-fix estimate. In contrast to commonly used measures like Rand index or variation of information, the TED (1) does not count small, but tolerable, differences, (2) provides intuitive numbers, (3) gives a time-to-fix estimate, and (4) can localize and classify the type of errors. By supporting both isotropic and anisotropic volumes and having a flexible tolerance criterion, the TED can be adapted to different requirements. On example segmentations for 3D neuron segmentation, we demonstrate that the TED is capable of counting topological errors, while ignoring small boundary shifts.