Tim Oates

LG
h-index31
59papers
4,932citations
Novelty46%
AI Score59

59 Papers

72.3LGMay 28
Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Prakhar Dixit, Sadia Kamal, Tim Oates

Reflexion-style agents rely on self-generated reflections as memory, implicitly assuming that agents can accurately diagnose their own failures.We show that this assumption can fail systematically: across ALFWorld and HumanEval, agents store confident but incorrect interpretations of the task and continue acting on them across trials,even though the environment resets to the correct task each time. We call this failure mode memory confabulation and introduce the Reflection Repetition Rate (RRR), a log-based metric that detects repeated reliance on incorrect reflective content.Using RRR, we identify 16 frozen environments in ALFWorld, where 0 of 121 reflections mention the correct target object, and 4 analogous cases in HumanEval. Our mitigation replaces open-ended self-diagnosis with programmatic extraction of trajectory-level failure signals, increasing correct object mention from 0% to 86%, reducing RRR from 0.64 to 0.10, and solving 3 of 16 frozen ALFWorld environments, suggesting that reflective memory can reinforce false beliefs rather than correct them.

LGJun 13, 2022
Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

Mohammad Mahmudul Alam, Edward Raff, Tim Oates et al.

Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols for leveraging an untrusted party exist but are too computational demanding to run in practice. We instead explore a different strategy of fast, heuristic security that we call Connectionist Symbolic Pseudo Secrets. By leveraging Holographic Reduced Representations (HRR), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary.

LGNov 9, 2023
LLM Augmented Hierarchical Agents

Bharat Prakash, Tim Oates, Tinoosh Mohsenin

Solving long-horizon, temporally-extended tasks using Reinforcement Learning (RL) is challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning). Humans can generate and execute plans with temporally-extended actions and quickly learn to perform new tasks because we almost never solve problems from scratch. We want autonomous agents to have this same ability. Recently, LLMs have been shown to encode a tremendous amount of knowledge about the world and to perform impressive in-context learning and reasoning. However, using LLMs to solve real world problems is hard because they are not grounded in the current task. In this paper we exploit the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks. Instead of completely relying on LLMs, they guide a high-level policy, making learning significantly more sample efficient. This approach is evaluated in simulation environments such as MiniGrid, SkillHack, and Crafter, and on a real robot arm in block manipulation tasks. We show that agents trained using our approach outperform other baselines methods and, once trained, don't need access to LLMs during deployment.

LGJun 28, 2023
cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Corey J. Nolet, Divye Gala, Alex Fender et al.

In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering

LGOct 16, 2022
Towards an Interpretable Hierarchical Agent Framework using Semantic Goals

Bharat Prakash, Nicholas Waytowich, Tim Oates et al.

Learning to solve long horizon temporally extended tasks with reinforcement learning has been a challenge for several years now. We believe that it is important to leverage both the hierarchical structure of complex tasks and to use expert supervision whenever possible to solve such tasks. This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal directed reinforcement learning. We assume access to certain spatial and haptic predicates and construct a simple and powerful semantic goal space. These semantic goal representations are more interpretable, making expert supervision and intervention easier. They also eliminate the need to write complex, dense reward functions thereby reducing human engineering effort. We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods, including both sparse and dense reward functions. We also suggest some next steps and discuss how this framework makes interaction and collaboration with humans easier.

CRFeb 6
Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al.

The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

CVDec 15, 2022
Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks

Khondoker Murad Hossain, Tim Oates

The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of efficiency and is more accurate, helping to ensure the safe application of deep learning and AI.

LGNov 23, 2022
Lempel-Ziv Networks

Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz et al.

Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences. Compression-based methods have demonstrated more robustness when processing such sequences -- in particular, an approach pairing the Lempel-Ziv Jaccard Distance (LZJD) with the k-Nearest Neighbor algorithm has shown promise on long sequence problems (up to $T=200,000,000$ steps) involving malware classification. Unfortunately, use of LZJD is limited to discrete domains. To extend the benefits of LZJD to a continuous domain, we investigate the effectiveness of a deep-learning analog of the algorithm, the Lempel-Ziv Network. While we achieve successful proof of concept, we are unable to improve meaningfully on the performance of a standard LSTM across a variety of datasets and sequence processing tasks. In addition to presenting this negative result, our work highlights the problem of sub-par baseline tuning in newer research areas.

CVFeb 13, 2023
RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget

Sourajit Saha, Shaswati Saha, Md Osman Gani et al.

Learning High-Resolution representations is essential for semantic segmentation. Convolutional neural network (CNN)architectures with downstream and upstream propagation flow are popular for segmentation in medical diagnosis. However, due to performing spatial downsampling and upsampling in multiple stages, information loss is inexorable. On the contrary, connecting layers densely on high spatial resolution is computationally expensive. In this work, we devise a Loose Dense Connection Strategy to connect neurons in subsequent layers with reduced parameters. On top of that, using a m-way Tree structure for feature propagation we propose Receptive Field Chain Network (RFC-Net) that learns high resolution global features on a compressed computational space. Our experiments demonstrates that RFC-Net achieves state-of-the-art performance on Kvasir and CVC-ClinicDB benchmarks for Polyp segmentation.

CLDec 21, 2025
MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

Tung Duong Ta, Tim Oates

Despite advances in mathematical reasoning capabilities, Large Language Models (LLMs) still struggle with calculation verification when using established prompting techniques. We present MDToC (Metacognitive Dynamic Tree of Concepts), a three-phase approach that constructs a concept tree, develops accuracy-verified calculations for each concept, and employs majority voting to evaluate competing solutions. Evaluations across CHAMP, MATH, and Game-of-24 benchmarks demonstrate our MDToC's effectiveness, with GPT-4-Turbo achieving 58.1\% on CHAMP, 86.6\% on MATH, and 85\% on Game-of-24 - outperforming GoT by 5\%, 5.4\%, and 4\% on all these tasks, respectively, without hand-engineered hints. MDToC consistently surpasses existing prompting methods across all backbone models, yielding improvements of up to 7.6\% over ToT and 6.2\% over GoT, establishing metacognitive calculation verification as a promising direction for enhanced mathematical reasoning.

AIOct 30, 2024Code
A Walsh Hadamard Derived Linear Vector Symbolic Architecture

Mohammad Mahmudul Alam, Alexander Oberle, Edward Raff et al.

Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space. VSAs support the commutativity and associativity of this binding operation, along with an inverse operation, allowing one to construct symbolic-style manipulations over real-valued vectors. Most VSAs were developed before deep learning and automatic differentiation became popular and instead focused on efficacy in hand-designed systems. In this work, we introduce the Hadamard-derived linear Binding (HLB), which is designed to have favorable computational efficiency, and efficacy in classic VSA tasks, and perform well in differentiable systems. Code is available at https://github.com/FutureComputing4AI/Hadamard-derived-Linear-Binding

LGMay 31, 2023Code
Recasting Self-Attention with Holographic Reduced Representations

Mohammad Mahmudul Alam, Edward Raff, Stella Biderman et al.

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including $\mathcal{O}(T H \log H)$ time complexity, $\mathcal{O}(T H)$ space complexity, and convergence in $10\times$ fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to $280\times$ faster to train on the Long Range Arena benchmark. Code is available at \url{https://github.com/NeuromorphicComputationResearchProgram/Hrrformer}

AISep 5, 2021Code
Learning with Holographic Reduced Representations

Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi et al.

Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning as a differentiable component of a deep learning architecture. HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space. In doing so we improve the concept retrieval efficacy of HRRs by over $100\times$. Using multi-label classification we demonstrate how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach. Our code can be found at https://github.com/NeuromorphicComputationResearchProgram/Learning-with-Holographic-Reduced-Representations

LGApr 13, 2021Code
GPU Semiring Primitives for Sparse Neighborhood Methods

Corey J. Nolet, Divye Gala, Edward Raff et al.

High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations. We demonstrate that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures while maintaining performance and memory efficiency on the GPU. We further show that this primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input. To our knowledge, this is the first work aiming to unify the computation of several critical distance measures on the GPU under a single flexible design paradigm and we hope that it provides a good baseline for future research in this area. Our implementation is fully open source and publicly available as part of the RAFT library of GPU-accelerated machine learning primitives (https://github.com/rapidsai/raft).

LGAug 1, 2020Code
Bringing UMAP Closer to the Speed of Light with GPU Acceleration

Corey J. Nolet, Victor Lafargue, Edward Raff et al.

The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficient and inaccurate versions of UMAP. We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP, and obtain speedups of up to 100x in practice. Many of these design choices/lessons are general purpose and may inform the conversion of other graph and manifold learning algorithms to use GPUs. Our implementation has been made publicly available as part of the open source RAPIDS cuML library (https://github.com/rapidsai/cuml).

39.7LGApr 2
Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents

Shalima Binta Manir, Tim Oates

Large language models deployed in supportive or advisory roles must balance helpfulness with preservation of user autonomy, yet standard alignment methods primarily optimize for helpfulness and harmlessness without explicitly modeling relational risks such as dependency reinforcement, overprotection, or coercive guidance. We introduce Care-Conditioned Neuromodulation (CCN), a state-dependent control framework in which a learned scalar signal derived from structured user state and dialogue context conditions response generation and candidate selection. We formalize this setting as an autonomy-preserving alignment problem and define a utility function that rewards autonomy support and helpfulness while penalizing dependency and coercion. We also construct a benchmark of relational failure modes in multi-turn dialogue, including reassurance dependence, manipulative care, overprotection, and boundary inconsistency. On this benchmark, care-conditioned candidate generation combined with utility-based reranking improves autonomy-preserving utility by +0.25 over supervised fine-tuning and +0.07 over preference optimization baselines while maintaining comparable supportiveness. Pilot human evaluation and zero-shot transfer to real emotional-support conversations show directional agreement with automated metrics. These results suggest that state-dependent control combined with utility-based selection is a practical approach to multi-objective alignment in autonomy-sensitive dialogue.

13.8CVMay 4
Learning to Segment using Summary Statistics and Weak Supervision

Omkar Kulkarni, Edward Raff, Tim Oates

Medical experts often manually segment images to obtain diagnostic statistics and discard the resulting annotations. We aim to train segmentation models to alleviate this burden, but constrained to the retained summary statistics (e.g., the area of the annotated region). Empirical results suggest that statistics alone are insufficient for this task, but adding weak information in the form of a few pixels within the area of interest significantly improves performance. We use a novel loss function that combines terms for image reconstruction quality, matching to summary statistics, and overlap between the predicted foreground and the weak supervisory signal. Experiments on standard image, ultrasound (breast cancer), and Computed Tomography (CT) scan (kidney tumors) data demonstrate the utility and potential of the approach.

LGJan 6, 2024
TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks

Khondoker Murad Hossain, Tim Oates · pku

As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into research and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where small changes to an input (triggers) can cause the model to produce incorrect outputs (e.g., to misclassify). This paper introduces a novel approach to backdoor detection that uses two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time, working across a wide variety of network architectures, making no assumptions about the nature of triggers used to alter network behavior, and being computationally efficient. We provide a detailed description of the detection pipeline along with results on models trained on the MNIST digit dataset, CIFAR-10 dataset, and two difficult datasets from NIST's TrojAI competition. These results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.

47.6AIApr 27
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People

Aydin Ayanzadeh, Tim Oates

Indoor navigation remains a critical accessibility challenge for the blind and low-vision (BLV) individuals, as existing solutions rely on costly per-building infrastructure. We present an agentic framework that converts a single floor plan image into a structured, retrievable knowledge base to generate safe, accessible navigation instructions with lightweight infrastructure. The system has two phases: a multi-agent module that parses the floor plan into a spatial knowledge graph through a self-correcting pipeline with iterative retry loops and corrective feedback; and a Path Planner that generates accessible navigation instructions, with a Safety Evaluator agent assessing potential hazards along each route. We evaluate the system on the real-world UMBC Math and Psychology building (floors MP-1 and MP-3) and on the CVC-FP benchmark. On MP-1, we achieve success rates of 92.31%, 76.92%, and 61.54% for short, medium, and long routes, outperforming the strongest single-call baseline (Claude 3.7 Sonnet) at 84.62%, 69.23%, and 53.85%. On MP-3, we reach 76.92%, 61.54%, and 38.46%, compared to the best baseline at 61.54%, 46.15%, and 23.08%. These results show consistent gains over single-call LLM baselines and demonstrate that our workflow is a scalable solution for accessible indoor navigation for BLV individuals.

CRMar 23, 2024
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Mohammad Mahmudul Alam, Edward Raff, Stella Biderman et al.

Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$.

LGOct 17, 2024
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation

Prakhar Dixit, Tim Oates

Many students struggle with math word problems (MWPs), often finding it difficult to identify key information and select the appropriate mathematical operations. Schema-based instruction (SBI) is an evidence-based strategy that helps students categorize problems based on their structure, improving problem-solving accuracy. Building on this, we propose a Schema-Based Instruction Retrieval-Augmented Generation (SBI-RAG) framework that incorporates a large language model (LLM). Our approach emphasizes step-by-step reasoning by leveraging schemas to guide solution generation. We evaluate its performance on the GSM8K dataset, comparing it with GPT-4 and GPT-3.5 Turbo, and introduce a "reasoning score" metric to assess solution quality. Our findings suggest that SBI-RAG enhances reasoning clarity and facilitates a more structured problem-solving process potentially providing educational benefits for students.

CRMar 13, 2024
Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks

Khondoker Murad Hossain, Tim Oates

In the rapidly evolving landscape of communication and network security, the increasing reliance on deep neural networks (DNNs) and cloud services for data processing presents a significant vulnerability: the potential for backdoors that can be exploited by malicious actors. Our approach leverages advanced tensor decomposition algorithms Independent Vector Analysis (IVA), Multiset Canonical Correlation Analysis (MCCA), and Parallel Factor Analysis (PARAFAC2) to meticulously analyze the weights of pre-trained DNNs and distinguish between backdoored and clean models effectively. The key strengths of our method lie in its domain independence, adaptability to various network architectures, and ability to operate without access to the training data of the scrutinized models. This not only ensures versatility across different application scenarios but also addresses the challenge of identifying backdoors without prior knowledge of the specific triggers employed to alter network behavior. We have applied our detection pipeline to three distinct computer vision datasets, encompassing both image classification and object detection tasks. The results demonstrate a marked improvement in both accuracy and efficiency over existing backdoor detection methods. This advancement enhances the security of deep learning and AI in networked systems, providing essential cybersecurity against evolving threats in emerging technologies.

CVJan 12, 2025
MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis

Sadia Kamal, Tim Oates

As deep learning models gain attraction in medical data, ensuring transparent and trustworthy decision-making is essential. In skin cancer diagnosis, while advancements in lesion detection and classification have improved accuracy, the black-box nature of these methods poses challenges in understanding their decision processes, leading to trust issues among physicians. This study leverages the CLIP (Contrastive Language-Image Pretraining) model, trained on different skin lesion datasets, to capture meaningful relationships between visual features and diagnostic criteria terms. To further enhance transparency, we propose a method called MedGrad E-CLIP, which builds on gradient-based E-CLIP by incorporating a weighted entropy mechanism designed for complex medical imaging like skin lesions. This approach highlights critical image regions linked to specific diagnostic descriptions. The developed integrated pipeline not only classifies skin lesions by matching corresponding descriptions but also adds an essential layer of explainability developed especially for medical data. By visually explaining how different features in an image relates to diagnostic criteria, this approach demonstrates the potential of advanced vision-language models in medical image analysis, ultimately improving transparency, robustness, and trust in AI-driven diagnostic systems.

AIDec 13, 2025
Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation

Aydin Ayanzadeh, Tim Oates

Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.

LGJan 26
ASEHybrid: When Geometry Matters Beyond Homophily in Graph Neural Networks

Shalima Binta Manir, Tim Oates

Standard message-passing graph neural networks (GNNs) often struggle on graphs with low homophily, yet homophily alone does not explain this behavior, as graphs with similar homophily levels can exhibit markedly different performance and some heterophilous graphs remain easy for vanilla GCNs. Recent work suggests that label informativeness (LI), the mutual information between labels of adjacent nodes, provides a more faithful characterization of when graph structure is useful. In this work, we develop a unified theoretical framework that connects curvature-guided rewiring and positional geometry through the lens of label informativeness, and instantiate it in a practical geometry-aware architecture, ASEHybrid. Our analysis provides a necessary-and-sufficient characterization of when geometry-aware GNNs can improve over feature-only baselines: such gains are possible if and only if graph structure carries label-relevant information beyond node features. Theoretically, we relate adjusted homophily and label informativeness to the spectral behavior of label signals under Laplacian smoothing, show that degree-based Forman curvature does not increase expressivity beyond the one-dimensional Weisfeiler--Lehman test but instead reshapes information flow, and establish convergence and Lipschitz stability guarantees for a curvature-guided rewiring process. Empirically, we instantiate ASEHybrid using Forman curvature and Laplacian positional encodings and conduct controlled ablations on Chameleon, Squirrel, Texas, Tolokers, and Minesweeper, observing gains precisely on label-informative heterophilous benchmarks where graph structure provides label-relevant information beyond node features, and no meaningful improvement in high-baseline regimes.

AISep 17, 2025
Hierarchical Learning for Maze Navigation: Emergence of Mental Representations via Second-Order Learning

Shalima Binta Manir, Tim Oates

Mental representation, characterized by structured internal models mirroring external environments, is fundamental to advanced cognition but remains challenging to investigate empirically. Existing theory hypothesizes that second-order learning -- learning mechanisms that adapt first-order learning (i.e., learning about the task/domain) -- promotes the emergence of such environment-cognition isomorphism. In this paper, we empirically validate this hypothesis by proposing a hierarchical architecture comprising a Graph Convolutional Network (GCN) as a first-order learner and an MLP controller as a second-order learner. The GCN directly maps node-level features to predictions of optimal navigation paths, while the MLP dynamically adapts the GCN's parameters when confronting structurally novel maze environments. We demonstrate that second-order learning is particularly effective when the cognitive system develops an internal mental map structurally isomorphic to the environment. Quantitative and qualitative results highlight significant performance improvements and robust generalization on unseen maze tasks, providing empirical support for the pivotal role of structured mental representations in maximizing the effectiveness of second-order learning.

AISep 10, 2025
One Model, Two Minds: A Context-Gated Graph Learner that Recreates Human Biases

Shalima Binta Manir, Tim Oates

We introduce a novel Theory of Mind (ToM) framework inspired by dual-process theories from cognitive science, integrating a fast, habitual graph-based reasoning system (System 1), implemented via graph convolutional networks (GCNs), and a slower, context-sensitive meta-adaptive learning system (System 2), driven by meta-learning techniques. Our model dynamically balances intuitive and deliberative reasoning through a learned context gate mechanism. We validate our architecture on canonical false-belief tasks and systematically explore its capacity to replicate hallmark cognitive biases associated with dual-process theory, including anchoring, cognitive-load fatigue, framing effects, and priming effects. Experimental results demonstrate that our dual-process approach closely mirrors human adaptive behavior, achieves robust generalization to unseen contexts, and elucidates cognitive mechanisms underlying reasoning biases. This work bridges artificial intelligence and cognitive theory, paving the way for AI systems exhibiting nuanced, human-like social cognition and adaptive decision-making capabilities.

LGAug 18, 2025
Predicting the Performance of Graph Convolutional Networks with Spectral Properties of the Graph Laplacian

Shalima Binta Manir, Tim Oates

A common observation in the Graph Convolutional Network (GCN) literature is that stacking GCN layers may or may not result in better performance on tasks like node classification and edge prediction. We have found empirically that a graph's algebraic connectivity, which is known as the Fiedler value, is a good predictor of GCN performance. Intuitively, graphs with similar Fiedler values have analogous structural properties, suggesting that the same filters and hyperparameters may yield similar results when used with GCNs, and that transfer learning may be more effective between graphs with similar algebraic connectivity. We explore this theoretically and empirically with experiments on synthetic and real graph data, including the Cora, CiteSeer and Polblogs datasets. We explore multiple ways of aggregating the Fiedler value for connected components in the graphs to arrive at a value for the entire graph, and show that it can be used to predict GCN performance. We also present theoretical arguments as to why the Fiedler value is a good predictor.

CVAug 7, 2025
Skin-SOAP: A Weakly Supervised Framework for Generating Structured SOAP Notes

Sadia Kamal, Tim Oates, Joy Wan

Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. Early diagnosis, accurate and timely treatment are critical to improving patient survival rates. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose skin-SOAP, a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate this clinical relevance, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.

CVJun 12, 2025
Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework

Sadia Kamal, Tim Oates, Joy Wan

Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate clinical quality, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.

CVFeb 25, 2025
DeBUGCN -- Detecting Backdoors in CNNs Using Graph Convolutional Networks

Akash Vartak, Khondoker Murad Hossain, Tim Oates

Deep neural networks (DNNs) are becoming commonplace in critical applications, making their susceptibility to backdoor (trojan) attacks a significant problem. In this paper, we introduce a novel backdoor attack detection pipeline, detecting attacked models using graph convolution networks (DeBUGCN). To the best of our knowledge, ours is the first use of GCNs for trojan detection. We use the static weights of a DNN to create a graph structure of its layers. A GCN is then used as a binary classifier on these graphs, yielding a trojan or clean determination for the DNN. To demonstrate the efficacy of our pipeline, we train hundreds of clean and trojaned CNN models on the MNIST handwritten digits and CIFAR-10 image datasets, and show the DNN classification results using DeBUGCN. For a true In-the-Wild use case, our pipeline is evaluated on the TrojAI dataset which consists of various CNN architectures, thus showing the robustness and model-agnostic behaviour of DeBUGCN. Furthermore, on comparing our results on several datasets with state-of-the-art trojan detection algorithms, DeBUGCN is faster and more accurate.

CVDec 23, 2023
Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations

Mohammad Mahmudul Alam, Edward Raff, Tim Oates

While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (less than 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when using a standard cross-entropy (CE) loss. In this paper, we demonstrate that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs by developing an alternative loss function using Holographic Reduced Representations (HRRs). We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs, and so we focus on specially crafted problems that isolate generalization to specific aspects of subitizing. Via saliency maps and out-of-distribution performance, we are able to empirically observe that the proposed HRR loss improves subitizing generalization though it does not completely solve the problem. In addition, we find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.

AINov 7, 2021
Automatic Goal Generation using Dynamical Distance Learning

Bharat Prakash, Nicholas Waytowich, Tinoosh Mohsenin et al.

Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment. However, sample efficiency remains a major challenge. In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging. On the other hand, humans or other biological agents learn such tasks in a much more strategic way, following a curriculum where tasks are sampled with increasing difficulty level in order to make gradual and efficient learning progress. In this work, we propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion. DDF is a function which predicts the dynamical distance between any two states within a markov decision process (MDP). With this, we generate a curriculum of goals at the appropriate difficulty level to facilitate efficient learning throughout the training process. We evaluate this approach on several goal-conditioned robotic manipulation and navigation tasks, and show improvements in sample efficiency over a baseline method which only uses random goal sampling.

AIOct 9, 2021
Interactive Hierarchical Guidance using Language

Bharat Prakash, Nicholas Waytowich, Tim Oates et al.

Reinforcement learning has been successful in many tasks ranging from robotic control, games, energy management etc. In complex real world environments with sparse rewards and long task horizons, sample efficiency is still a major challenge. Most complex tasks can be easily decomposed into high-level planning and low level control. Therefore, it is important to enable agents to leverage the hierarchical structure and decompose bigger tasks into multiple smaller sub-tasks. We introduce an approach where we use language to specify sub-tasks and a high-level planner issues language commands to a low level controller. The low-level controller executes the sub-tasks based on the language commands. Our experiments show that this method is able to solve complex long horizon planning tasks with limited human supervision. Using language has added benefit of interpretability and ability for expert humans to take over the high-level planning task and provide language commands if necessary.

LGSep 30, 2021
Determining Standard Occupational Classification Codes from Job Descriptions in Immigration Petitions

Sourav Mukherjee, David Widmark, Vince DiMascio et al.

Accurate specification of standard occupational classification (SOC) code is critical to the success of many U.S. work visa applications. Determination of correct SOC code relies on careful study of job requirements and comparison to definitions given by the U.S. Bureau of Labor Statistics, which is often a tedious activity. In this paper, we apply methods from natural language processing (NLP) to computationally determine SOC code based on job description. We implement and empirically evaluate a broad variety of predictive models with respect to quality of prediction and training time, and identify models best suited for this task.

CLJun 30, 2021
Learning a Reversible Embedding Mapping using Bi-Directional Manifold Alignment

Ashwinkumar Ganesan, Francis Ferraro, Tim Oates

We propose a Bi-Directional Manifold Alignment (BDMA) that learns a non-linear mapping between two manifolds by explicitly training it to be bijective. We demonstrate BDMA by training a model for a pair of languages rather than individual, directed source and target combinations, reducing the number of models by 50%. We show that models trained with BDMA in the "forward" (source to target) direction can successfully map words in the "reverse" (target to source) direction, yielding equivalent (or better) performance to standard unidirectional translation models where the source and target language is flipped. We also show how BDMA reduces the overall size of the model.

LGSep 29, 2020
Immigration Document Classification and Automated Response Generation

Sourav Mukherjee, Tim Oates, Vince DiMascio et al.

In this paper, we consider the problem of organizing supporting documents vital to U.S. work visa petitions, as well as responding to Requests For Evidence (RFE) issued by the U.S.~Citizenship and Immigration Services (USCIS). Typically, both processes require a significant amount of repetitive manual effort. To reduce the burden of mechanical work, we apply machine learning methods to automate these processes, with humans in the loop to review and edit output for submission. In particular, we use an ensemble of image and text classifiers to categorize supporting documents. We also use a text classifier to automatically identify the types of evidence being requested in an RFE, and used the identified types in conjunction with response templates and extracted fields to assemble draft responses. Empirical results suggest that our approach achieves considerable accuracy while significantly reducing processing time.

LGApr 7, 2020
Locality Preserving Loss: Neighbors that Live together, Align together

Ashwinkumar Ganesan, Francis Ferraro, Tim Oates

We present a locality preserving loss (LPL) that improves the alignment between vector space embeddings while separating uncorrelated representations. Given two pretrained embedding manifolds, LPL optimizes a model to project an embedding and maintain its local neighborhood while aligning one manifold to another. This reduces the overall size of the dataset required to align the two in tasks such as cross-lingual word alignment. We show that the LPL-based alignment between input vector spaces acts as a regularizer, leading to better and consistent accuracy than the baseline, especially when the size of the training set is small. We demonstrate the effectiveness of LPL optimized alignment on semantic text similarity (STS), natural language inference (SNLI), multi-genre language inference (MNLI) and cross-lingual word alignment(CLA) showing consistent improvements, finding up to 16% improvement over our baseline in lower resource settings.

LGOct 10, 2019
Using Neural Networks for Programming by Demonstration

Karan K. Budhraja, Hang Gao, Tim Oates

Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on a defined taxonomy of possible behaviors, the demonstrator specifies the desired emergent behavior of the system over time, and retrieves agent-level parameters required to execute that motion. A low time-complexity and data requirement favoring framework for reproducing emergent behavior, given an abstract demonstration, is discussed in [1], [2]. The existing framework does, however, observe an inherent limitation in scalability because of an exponentially growing search space (with the number of agent-level parameters). Our work addresses this limitation by pursuing a more scalable architecture with the use of neural networks. While the (proof-of-concept) architecture is not suitable for many evaluated domains because of its lack of representational capacity for that domain, it is more suitable than existing work for larger datasets for the Civil Violence agent-based model.

CLOct 10, 2019
Universal Adversarial Perturbation for Text Classification

Hang Gao, Tim Oates

Given a state-of-the-art deep neural network text classifier, we show the existence of a universal and very small perturbation vector (in the embedding space) that causes natural text to be misclassified with high probability. Unlike images on which a single fixed-size adversarial perturbation can be found, text is of variable length, so we define the "universality" as "token-agnostic", where a single perturbation is applied to each token, resulting in different perturbations of flexible sizes at the sequence level. We propose an algorithm to compute universal adversarial perturbations, and show that the state-of-the-art deep neural networks are highly vulnerable to them, even though they keep the neighborhood of tokens mostly preserved. We also show how to use these adversarial perturbations to generate adversarial text samples. The surprising existence of universal "token-agnostic" adversarial perturbations may reveal important properties of a text classifier.

LGSep 29, 2019
Learning from Observations Using a Single Video Demonstration and Human Feedback

Sunil Gandhi, Tim Oates, Tinoosh Mohsenin et al.

In this paper, we present a method for learning from video demonstrations by using human feedback to construct a mapping between the standard representation of the agent and the visual representation of the demonstration. In this way, we leverage the advantages of both these representations, i.e., we learn the policy using standard state representations, but are able to specify the expected behavior using video demonstration. We train an autonomous agent using a single video demonstration and use human feedback (using numerical similarity rating) to map the standard representation to the visual representation with a neural network. We show the effectiveness of our method by teaching a hopper agent in the MuJoCo to perform a backflip using a single video demonstration generated in MuJoCo as well as from a real-world YouTube video of a person performing a backflip. Additionally, we show that our method can transfer to new tasks, such as hopping, with very little human feedback.

SISep 12, 2019
Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter

Chi Zhang, Bryan Wilkinson, Ashwinkumar Ganesan et al.

Denial of Service (DoS) attacks are common in on-line and mobile services such as Twitter, Facebook and banking. As the scale and frequency of Distributed Denial of Service (DDoS) attacks increase, there is an urgent need for determining the impact of the attack. Two central challenges of the task are to get feedback from a large number of users and to get it in a timely manner. In this paper, we present a weakly-supervised model that does not need annotated data to measure the impact of DoS issues by applying Latent Dirichlet Allocation and symmetric Kullback-Leibler divergence on tweets. There is a limitation to the weakly-supervised module. It assumes that the event detected in a time window is a DoS attack event. This will become less of a problem, when more non-attack events twitter got collected and become less likely to be identified as a new event. Another way to remove that limitation, an optional classification layer, trained on manually annotated DoS attack tweets, to filter out non-attack tweets can be used to increase precision at the expense of recall. Experimental results show that we can learn weakly-supervised models that can achieve comparable precision to supervised ones and can be generalized across entities in the same industry.

LGAug 8, 2019
Graph Node Embeddings using Domain-Aware Biased Random Walks

Sourav Mukherjee, Tim Oates, Ryan Wright

The recent proliferation of publicly available graph-structured data has sparked an interest in machine learning algorithms for graph data. Since most traditional machine learning algorithms assume data to be tabular, embedding algorithms for mapping graph data to real-valued vector spaces has become an active area of research. Existing graph embedding approaches are based purely on structural information and ignore any semantic information from the underlying domain. In this paper, we demonstrate that semantic information can play a useful role in computing graph embeddings. Specifically, we present a framework for devising embedding strategies aware of domain-specific interpretations of graph nodes and edges, and use knowledge of downstream machine learning tasks to identify relevant graph substructures. Using two real-life domains, we show that our framework yields embeddings that are simple to implement and yet achieve equal or greater accuracy in machine learning tasks compared to domain independent approaches.

QMApr 18, 2019
Hybrid Mortality Prediction using Multiple Source Systems

Isaac Mativo, Yelena Yesha, Michael Grasso et al.

The use of artificial intelligence in clinical care to improve decision support systems is increasing. This is not surprising since, by its very nature, the practice of medicine consists of making decisions based on observations from different systems both inside and outside the human body. In this paper, we combine three general systems (ICU, diabetes, and comorbidities) and use them to make patient clinical predictions. We use an artificial intelligence approach to show that we can improve mortality prediction of hospitalized diabetic patients. We do this by utilizing a machine learning approach to select clinical input features that are more likely to predict mortality. We then use these features to create a hybrid mortality prediction model and compare our results to non-artificial intelligence models. For simplicity, we limit our input features to patient comorbidities and features derived from a well-known mortality measure, the Sequential Organ Failure Assessment (SOFA).

CRMar 28, 2019
Extending Signature-based Intrusion Detection Systems WithBayesian Abductive Reasoning

Ashwinkumar Ganesan, Pooja Parameshwarappa, Akshay Peshave et al.

Evolving cybersecurity threats are a persistent challenge for systemadministrators and security experts as new malwares are continu-ally released. Attackers may look for vulnerabilities in commercialproducts or execute sophisticated reconnaissance campaigns tounderstand a targets network and gather information on securityproducts like firewalls and intrusion detection / prevention systems(network or host-based). Many new attacks tend to be modificationsof existing ones. In such a scenario, rule-based systems fail to detectthe attack, even though there are minor differences in conditions /attributes between rules to identify the new and existing attack. Todetect these differences the IDS must be able to isolate the subset ofconditions that are true and predict the likely conditions (differentfrom the original) that must be observed. In this paper, we proposeaprobabilistic abductive reasoningapproach that augments an exist-ing rule-based IDS (snort [29]) to detect these evolved attacks by (a)Predicting rule conditions that are likely to occur (based on existingrules) and (b) able to generate new snort rules when provided withseed rule (i.e. a starting rule) to reduce the burden on experts toconstantly update them. We demonstrate the effectiveness of theapproach by generating new rules from the snort 2012 rules set andtesting it on the MACCDC 2012 dataset [6].

LGMar 25, 2019
On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning

Bharat Prakash, Mark Horton, Nicholas R. Waytowich et al.

In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very expensive and time consuming. Autoencoders are deep neural network used to compress high dimensional data such as pixelated images into small latent representations. This compression model is vital to efficiently learn policies, especially when learning on embedded systems. We have implemented this model on the NVIDIA Jetson TX2 embedded GPU, and evaluated the power consumption, throughput, and energy consumption of the autoencoders for various CPU/GPU core combinations, frequencies, and model parameters. Additionally, we have shown the reconstructions generated by the autoencoder to analyze the quality of the generated compressed representation and also the performance of the reinforcement learning agent. Finally, we have presented an assessment of the viability of training these models on embedded systems and their usefulness in developing autonomous policies. Using autoencoders, we were able to achieve 4-5 $\times$ improved performance compared to a baseline RL agent with a convolutional feature extractor, while using less than 2W of power.

CRAug 1, 2018
Cognitive Techniques for Early Detection of Cybersecurity Events

Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi et al.

The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning with machine learning techniques. Our Cognitive Cybersecurity system ingests information from textual sources, and various agents representing host and network-based sensors, and represents this information in a knowledge graph. This graph uses terms from an extended version of the Unified Cybersecurity Ontology. The system reasons over the knowledge graph to derive better actionable intelligence to security administrators, thus decreasing their cognitive load and increasing their confidence in the system. We have developed a proof of concept framework for our approach and demonstrate its capabilities using a custom-built ransomware instance that is similar to WannaCry.

DCSep 13, 2017
Automated Cloud Provisioning on AWS using Deep Reinforcement Learning

Zhiguang Wang, Chul Gwon, Tim Oates et al.

As the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\% - 45\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire policies to balance performance and spend, allowing humans to specify what they want as opposed to how to do it, minimizing the need for cloud expertise. Empirical results with tabular, deep, and dueling double deep Q-learning with the CloudSim simulator show the utility of RL and the relative merits of the approaches. We also demonstrate effective policy transfer learning from an extremely simple simulator to CloudSim, with the next step being transfer from CloudSim to an Amazon Web Services physical environment.

CVAug 28, 2017
Deep Belief Networks used on High Resolution Multichannel Electroencephalography Data for Seizure Detection

JT Turner, Adam Page, Tinoosh Mohsenin et al.

Ubiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning algorithms applied to the task of seizure detection in high resolution, multichannel EEG data. We explore classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable for such tasks as the number of people involved and the amount of data they produce grows to be quite large. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions.

CVJul 31, 2017
Fashioning with Networks: Neural Style Transfer to Design Clothes

Prutha Date, Ashwinkumar Ganesan, Tim Oates

Convolutional Neural Networks have been highly successful in performing a host of computer vision tasks such as object recognition, object detection, image segmentation and texture synthesis. In 2015, Gatys et. al [7] show how the style of a painter can be extracted from an image of the painting and applied to another normal photograph, thus recreating the photo in the style of the painter. The method has been successfully applied to a wide range of images and has since spawned multiple applications and mobile apps. In this paper, the neural style transfer algorithm is applied to fashion so as to synthesize new custom clothes. We construct an approach to personalize and generate new custom clothes based on a users preference and by learning the users fashion choices from a limited set of clothes from their closet. The approach is evaluated by analyzing the generated images of clothes and how well they align with the users fashion style.