Nathan O. Hodas

h-index20

27papers

1,998citations

Novelty42%

AI Score26

Ranked #161,731 of 194,257 authors (top 83%)#35,410 in LG (top 88%)

27 Papers

1.2SYSep 25, 2017

A Koopman Operator Approach for Computing and Balancing Gramians for Discrete Time Nonlinear Systems

Enoch Yeung, Zhiyuan Liu, Nathan O. Hodas

In this paper, we consider the problem of quantifying controllability and observability of a nonlinear discrete time dynamical system. We introduce the Koopman operator as a canonical representation of the system and apply a lifting technique to compute gramians in the space of full-state observables. We illustrate the properties of these gramians and identify several relationships with canonical results on local controllability and observability. Once defined, we show that these gramians can be balanced through a change of coordinates on the observables space, which in turn allows for direct application of balanced truncation. Throughout the paper, we highlight the aspects of our approach with an example nonlinear system.

5.5LGDec 2, 2021

Reward-Free Attacks in Multi-Agent Reinforcement Learning

Ted Fujimoto, Timothy Doster, Adam Attarian et al.

We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to behave strategically when the victim's motivations are unknown. We argue that one heuristic approach an attacker can use is to maximize the entropy of the victim's policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a reward-free exploration algorithm that maximizes the attacker's entropy during the exploration phase, and then maximizes the victim's empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim's reward to succeed. Hence, reward-free attacks, which are based only on observing behavior, show the feasibility of an attacker to act strategically without knowledge of the victim's motives even if the victim's reward information is protected.

1.6LGNov 22, 2021

Adaptive Transfer Learning: a simple but effective transfer learning

Jung H Lee, Henry J Kvinge, Scott Howland et al.

Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a line of studies showed that the same feature extractors can be used to train classifiers on multiple tasks. Furthermore, recent studies proposed multiple algorithms that can fine-tune teacher models' feature extractors to train student models more efficiently. We note that regardless of the fine-tuning of feature extractors, the classifiers of student models are trained with final outputs of feature extractors (i.e., the outputs of penultimate layers). However, a recent study suggested that feature maps in ResNets across layers could be functionally equivalent, raising the possibility that feature maps inside the feature extractors can also be used to train student models' classifiers. Inspired by this study, we tested if feature maps in the hidden layers of the teacher models can be used to improve the student models' accuracy (i.e., TL's efficiency). Specifically, we developed 'adaptive transfer learning (ATL)', which can choose an optimal set of feature maps for TL, and tested it in the few-shot learning setting. Our empirical evaluations suggest that ATL can help DL models learn more efficiently, especially when available examples are limited.

1.6LGJun 2, 2021

One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations

Henry Kvinge, Scott Howland, Nico Courts et al.

The field of few-shot learning has made remarkable strides in developing powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we describe this challenge of identifying what we term 'out-of-support' (OOS) examples. We describe how this problem is subtly different from out-of-distribution detection and describe a new method of identifying OOS examples within the Prototypical Networks framework using a fixed point which we call the generic representation. We show that our method outperforms other existing approaches in the literature as well as other approaches that we propose in this paper. Finally, we investigate how the use of such a generic point affects the geometry of a model's feature space.

1.4CVApr 8, 2021

Prototypical Region Proposal Networks for Few-Shot Localization and Classification

Elliott Skomski, Aaron Tuor, Andrew Avila et al.

Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject, instead appearing potentially occluded, small, or among other incidental objects belonging to other classes of potential interest. To localize relevant objects, we employ a prototype-based few-shot segmentation model which compares the encoded features of unlabeled query images with support class centroids to produce region proposals indicating the presence and location of support set classes in a query image. These region proposals are then used as additional conditioning input to few-shot image classifiers. We develop a framework to unify the two stages (segmentation and classification) into an end-to-end classification model -- PRoPnet -- and empirically demonstrate that our methods improve accuracy on image datasets with natural scenes containing multiple object classes.

1.2LGSep 23, 2020

Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning

Henry Kvinge, Zachary New, Nico Courts et al.

Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks within a fixed dataset (for example: moving from category membership to tasks that involve detecting object orientation or quantity). To formalize this kind of shift we define a notion of "independence of tasks" and identify three new sets of labels for established computer vision datasets that test a model's ability to generalize to tasks which draw on orthogonal attributes in the data. We use these datasets to investigate the failure modes of metric-based few-shot models. Based on our findings, we introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data. In particular, FSN models can not only form multiple representations for a given class but can also begin to capture the low-dimensional structure which characterizes class manifolds in the encoded space of deep networks. We show that FSN outperforms state-of-the-art models on the challenging tasks we introduce in this paper while remaining competitive on standard few-shot benchmarks.

6.0LGNov 15, 2019

Explanatory Masks for Neural Network Interpretability

Lawrence Phillips, Garrett Goh, Nathan Hodas

Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original network. Masks are created by a secondary network whose goal is to create as small an explanation as possible while still preserving the predictive accuracy of the original network. We demonstrate the applicability of our method for image classification with CNNs, sentiment analysis with RNNs, and chemical property prediction with mixed CNN/RNN architectures.

8.5CVSep 14, 2019

Metric-Based Few-Shot Learning for Video Action Recognition

Chris Careaga, Brian Hutchinson, Nathan Hodas et al.

In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We evaluate the performance of a set of convolutional and recurrent neural network video encoder architectures used in conjunction with three popular metric-based few-shot algorithms. We train and evaluate using a few-shot split of the Kinetics 600 dataset. Our experiments confirm the importance of the two-stream setup, and find prototypical networks and pooled long short-term memory network embeddings to give the best performance as few-shot method and video encoder, respectively. For a 5-shot 5-way task, this setup obtains 84.2% accuracy on the test set and 59.4% on a special "challenge" test set, composed of highly confusable classes.

2.9LGOct 9, 2018

The Outer Product Structure of Neural Network Derivatives

Craig Bakker, Michael J. Henry, Nathan O. Hodas

In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods.

3.5LGMay 13, 2018

Doing the impossible: Why neural networks can be trained at all

Nathan O. Hodas, Panos Stinis

As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights.

7.3HCFeb 14, 2018

Sharkzor: Interactive Deep Learning for Image Triage, Sort and Summary

Meg Pirrung, Nathan Hilliard, Artëm Yankov et al.

Sharkzor is a web application for machine-learning assisted image sort and summary. Deep learning algorithms are leveraged to infer, augment, and automate the user's mental model. Initially, images uploaded by the user are spread out on a canvas. The user then interacts with the images to impute their mental model into the application's algorithmic underpinnings. Methods of interaction within Sharkzor's user interface and user experience support three primary user tasks; triage, organize and automate. The user triages the large pile of overlapping images by moving images of interest into proximity. The user then organizes said images into meaningful groups. After interacting with the images and groups, deep learning helps to automate the user's interactions. The loop of interaction, automation, and response by the user allows the system to quickly make sense of large amounts of data.

24.9LGFeb 12, 2018

Few-Shot Learning with Metric-Agnostic Conditional Embeddings

Nathan Hilliard, Lawrence Phillips, Scott Howland et al.

Learning high quality class representations from few examples is a key problem in metric-learning approaches to few-shot learning. To accomplish this, we introduce a novel architecture where class representations are conditioned for each few-shot trial based on a target image. We also deviate from traditional metric-learning approaches by training a network to perform comparisons between classes rather than relying on a static metric comparison. This allows the network to decide what aspects of each class are important for the comparison at hand. We find that this flexible architecture works well in practice, achieving state-of-the-art performance on the Caltech-UCSD birds fine-grained classification task.

16.5MLDec 7, 2017

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Garrett B. Goh, Charles Siegel, Abhinav Vishnu et al.

With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.

18.7MLDec 6, 2017

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Garrett B. Goh, Nathan O. Hodas, Charles Siegel et al.

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry.

7.6MLOct 5, 2017

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Garrett B. Goh, Charles Siegel, Abhinav Vishnu et al.

The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model's performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.

26.9LGAug 22, 2017

Learning Deep Neural Network Representations for Koopman Operators of Nonlinear Dynamical Systems

Enoch Yeung, Soumya Kundu, Nathan Hodas

The Koopman operator has recently garnered much attention for its value in dynamical systems analysis and data-driven model discovery. However, its application has been hindered by the computational complexity of extended dynamic mode decomposition; this requires a combinatorially large basis set to adequately describe many nonlinear systems of interest, e.g. cyber-physical infrastructure systems, biological networks, social systems, and fluid dynamics. Often the dictionaries generated for these problems are manually curated, requiring domain-specific knowledge and painstaking tuning. In this paper we introduce a deep learning framework for learning Koopman operators of nonlinear dynamical systems. We show that this novel method automatically selects efficient deep dictionaries, outperforming state-of-the-art methods. We benchmark this method on partially observed nonlinear systems, including the glycolytic oscillator and show it is able to predict quantitatively 100 steps into the future, using only a single timepoint, and qualitative oscillatory behavior 400 steps into the future.

3.2LGAug 22, 2017

Dynamic Input Structure and Network Assembly for Few-Shot Learning

Nathan Hilliard, Nathan O. Hodas, Courtney D. Corley

The ability to learn from a small number of examples has been a difficult problem in machine learning since its inception. While methods have succeeded with large amounts of training data, research has been underway in how to accomplish similar performance with fewer examples, known as one-shot or more generally few-shot learning. This technique has been shown to have promising performance, but in practice requires fixed-size inputs making it impractical for production systems where class sizes can vary. This impedes training and the final utility of few-shot learning systems. This paper describes an approach to constructing and training a network that can handle arbitrary example sizes dynamically as the system is used.

17.8MLJun 20, 2017

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

Garrett B. Goh, Charles Siegel, Abhinav Vishnu et al.

In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm.

3.2HCJun 6, 2017

Understanding Cognitive Depletion in Novice NMR Analysts

Lyndsey Franklin, Kyungsik Han, Zhuanyi Huang et al.

We present the results of a user study with novice NMR analysts (N=19) involving a gamified simulation of the NMR analysis process. Participants solved randomly generated spectrum puzzles for up to three hours. We used eye tracking, event logging, and observations to record symptoms of cognitive depletion while participants worked. Analysis of results indicate that we can detect both signs of learning and signs of cognitive depletion in participants over the course of the three hours. Participants' break strategies did not predict or reflect game scores, but certain symptoms appear predictive of breaks.

2.9CLJun 6, 2017

Assessing the Linguistic Productivity of Unsupervised Deep Neural Networks

Lawrence Phillips, Nathan Hodas

Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rules productively, using them in novel situations? We draw from the literature on determiner/noun productivity by training an unsupervised, autoencoder network measuring its ability to combine nouns with determiners. Our simple autoencoder creates combinations it has not previously encountered and produces a degree of overlap matching adults. While this preliminary work does not provide conclusive evidence for productivity, it warrants further investigation with more complex models. Further, this work helps lay the foundations for future collaboration between the deep learning and cognitive science communities.

3.2HCJun 5, 2017

Cognitive Depletion in the Wild: a Case Study of NMR Spectroscopy Analysis

Lyndsey Franklin, Nathan Hodas

NMR spectroscopy analysis is a detail-oriented analytic feat that typically requires specific domain expertise and hours of concentration. This work presents an ethnographic-style study of this analysis process in the context of evaluating the symptoms of cognitive depletion. The repeated, non-trivial decisions required by and the time-consuming nature of NMR spectroscopy analysis make it an ideal, real-world scenario to study the symptoms of cognitive depletion, its effect on workflow and performance, and potential strategies for mitigating its deleterious effects.

5.7HCJun 5, 2017

Will Break for Productivity: Generalized Symptoms of Cognitive Depletion

Lyndsey Franklin, Kristina Lerman, Nathan Hodas

In this work, we address the symptoms of cognitive depletion as they relate to generalized knowledge workers. We unify previous findings within a single analytical model of cognitive depletion. Our purpose is to develop a model that will help us predict when a person has reached a sufficient state of cognitive depletion such that taking a break or some other restorative action will benefit both his or her own wellbeing and the quality of his or her performance. We provide a definition of each symptom in our model as well as the effect it would have on a knowledge worker's ability to work productively. We discuss methods to detect each symptom that do not require self assessment. Understanding symptoms of cognitive depletion provides the ability to support human knowledge workers by reducing the stress involved with cognitive and work overload while maintaining or improving the quality of their performance.

22.2MLJan 17, 2017

Deep Learning for Computational Chemistry

Garrett B. Goh, Nathan O. Hodas, Abhinav Vishnu

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure prediction, quantum chemistry, materials design and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry.

5.2OCDec 17, 2016

Mutual information for fitting deep nonlinear models

Jacob S. Hunter, Nathan O. Hodas

Deep nonlinear models pose a challenge for fitting parameters due to lack of knowledge of the hidden layer and the potentially non-affine relation of the initial and observed layers. In the present work we investigate the use of information theoretic measures such as mutual information and Kullback-Leibler (KL) divergence as objective functions for fitting such models without knowledge of the hidden layer. We investigate one model as a proof of concept and one application of cogntive performance. We further investigate the use of optimizers with these methods. Mutual information is largely successful as an objective, depending on the parameters. KL divergence is found to be similarly succesful, given some knowledge of the statistics of the hidden layer.

2.7LGNov 6, 2016

Beyond Fine Tuning: A Modular Approach to Learning on Small Data

Ark Anderson, Kyle Shaffer, Artem Yankov et al.

In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine pre-trained modules with untrained modules, to learn the shift in distributions between data sets. The central impact of using a modular approach comes from adding new representations to a network, as opposed to replacing representations via fine-tuning. Using this technique, we are able surpass results using standard fine-tuning transfer learning approaches, and we are also able to significantly increase performance over such approaches when using smaller amounts of data.

2.3SISep 1, 2016

How a user's personality influences content engagement in social media

Nathan O. Hodas, Ryan Butner, Court Corley

Social media presents an opportunity for people to share content that they find to be significant, funny, or notable. No single piece of content will appeal to all users, but are there systematic variations between users that can help us better understand information propagation? We conducted an experiment exploring social media usage during disaster scenarios, combining electroencephalogram (EEG), personality surveys, and prompts to share social media, we show how personality not only drives willingness to engage with social media but also helps to determine what type of content users find compelling. As expected, extroverts are more likely to share content. In contrast, one of our central results is that individuals with depressive personalities are the most likely cohort to share informative content, like news or alerts. Because personality and mood will generally be highly correlated between friends via homophily, our results may be an import factor in understanding social contagion.

10.0HCApr 6, 2016

Adding Semantic Information into Data Models by Learning Domain Expertise from User Interaction

Nathan Oken Hodas, Alex Endert

Interactive visual analytic systems enable users to discover insights from complex data. Users can express and test hypotheses via user interaction, leveraging their domain expertise and prior knowledge to guide and steer the analytic models in the system. For example, semantic interaction techniques enable systems to learn from the user's interactions and steer the underlying analytic models based on the user's analytical reasoning. However, an open challenge is how to not only steer models based on the dimensions or features of the data, but how to add dimensions or attributes to the data based on the domain expertise of the user. In this paper, we present a technique for inferring and appending dimensions onto the dataset based on the prior expertise of the user expressed via user interactions. Our technique enables users to directly manipulate a spatial organization of data, from which both the dimensions of the data are weighted, and also dimensions created to represent the prior knowledge the user brings to the system. We describe this technique and demonstrate its utility via a use case.