LGJan 14, 2023
Understanding the Spectral Bias of Coordinate Based MLPs Via Training DynamicsJohn Lazzari, Xiuwen Liu
Spectral bias is an important observation of neural network training, stating that the network will learn a low frequency representation of the target function before converging to higher frequency components. This property is interesting due to its link to good generalization in over-parameterized networks. However, in low dimensional settings, a severe spectral bias occurs that obstructs convergence to high frequency components entirely. In order to overcome this limitation, one can encode the inputs using a high frequency sinusoidal encoding. Previous works attempted to explain this phenomenon using Neural Tangent Kernel (NTK) and Fourier analysis. However, NTK does not capture real network dynamics, and Fourier analysis only offers a global perspective on the network properties that induce this bias. In this paper, we provide a novel approach towards understanding spectral bias by directly studying ReLU MLP training dynamics. Specifically, we focus on the connection between the computations of ReLU networks (activation regions), and the speed of gradient descent convergence. We study these dynamics in relation to the spatial information of the signal to understand how they influence spectral bias. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
LGJan 25, 2023
Improved Stock Price Movement Classification Using News Articles Based on Embeddings and Label SmoothingLuis Villamil, Ryan Bausback, Shaeke Salman et al.
Stock price movement prediction is a challenging and essential problem in finance. While it is well established in modern behavioral finance that the share prices of related stocks often move after the release of news via reactions and overreactions of investors, how to capture the relationships between price movements and news articles via quantitative models is an active area research; existing models have achieved success with variable degrees. In this paper, we propose to improve stock price movement classification using news articles by incorporating regularization and optimization techniques from deep learning. More specifically, we capture the dependencies between news articles and stocks through embeddings and bidirectional recurrent neural networks as in recent models. We further incorporate weight decay, batch normalization, dropout, and label smoothing to improve the generalization of the trained models. To handle high fluctuations of validation accuracy of batch normalization, we propose dual-phase training to realize the improvements reliably. Our experimental results on a commonly used dataset show significant improvements, achieving average accuracy of 80.7% on the test set, which is more than 10.0% absolute improvement over existing models. Our ablation studies show batch normalization and label smoothing are most effective, leading to 6.0% and 3.4% absolute improvement, respectively on average.
ROJul 10, 2024
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation SystemsChashi Mahiul Islam, Shaeke Salman, Montasir Shams et al.
Building on the unprecedented capabilities of large language models for command understanding and zero-shot recognition of multi-modal vision-language transformers, visual language navigation (VLN) has emerged as an effective way to address multiple fundamental challenges toward a natural language interface to robot navigation. However, such vision-language models are inherently vulnerable due to the lack of semantic meaning of the underlying embedding space. Using a recently developed gradient based optimization procedure, we demonstrate that images can be modified imperceptibly to match the representation of totally different images and unrelated texts for a vision-language model. Building on this, we develop algorithms that can adversarially modify a minimal number of images so that the robot will follow a route of choice for commands that require a number of landmarks. We demonstrate that experimentally using a recently proposed VLN system; for a given navigation command, a robot can be made to follow drastically different routes. We also develop an efficient algorithm to detect such malicious modifications reliably based on the fact that the adversarially modified images have much higher sensitivity to added Gaussian noise than the original images.
34.5AIMay 19
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive CybersecuritySamuel Jacob Chacko, James Hugglestone, Chashi Mahiul Islam et al.
Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task pass rates by an average of 16.2~percentage points across diverse domains. Yet the same benchmarks show wide variance, with 16 of 84 tasks suffering negative deltas when Skills are introduced. The community has not yet articulated a clean mechanism for \emph{when} Skills help and when they are merely redundant overhead. We re-analyze a recently published 180-run controlled study of an MCP-grounded autonomous Capture-the-Flag (CTF) agent under four documentation conditions of increasing richness (55, 1{,}478, 1{,}976, and 4{,}147 lines), and show that these conditions correspond almost exactly to a No-Skills, Experiential-Skills, Curated-Skills, and Comprehensive-Skills ablation. In offensive cybersecurity, a domain not deeply covered by existing Skills benchmarks, the marginal benefit of Skills collapses. The spread between the no-Skills and full-Skills conditions is only 8.9~pp ($p = 0.71$, $χ^2$; $p = 0.25$, Cochran--Armitage trend test; five of six pairwise Cohen's $h$ values fall below the $0.2$ small-effect threshold). We argue that the missing variable is \emph{environment-feedback bandwidth}. When an agent's tool layer returns strict, schema-validated, low-latency observations, the environment itself supplies the procedural correction signal that Skills are normally needed to provide. As a result, the marginal benefit of curated Skills diminishes substantially, and, in some cases (e.g., our timing side-channel setting), actively degrades performance. We articulate a falsifiable hypothesis, sketch its design implications for compound AI systems, and will release the reanalysis pipeline to support replication.
30.3AIApr 14
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language ModelsChashi Mahiul Islam, Alan Villarreal, Mao Nishino et al.
As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstream effects of these instabilities, the root causes and underlying mechanisms remain poorly understood. In this paper, we present a rigorous analysis of how unpredictability is rooted in the finite numerical precision of floating-point representations, tracking how rounding errors propagate, amplify, or dissipate through Transformer computation layers. Specifically, we identify a chaotic "avalanche effect" in the early layers, where minor perturbations trigger binary outcomes: either rapid amplification or complete attenuation. Beyond specific error instances, we demonstrate that LLMs exhibit universal, scale-dependent chaotic behaviors characterized by three distinct regimes: 1) a stable regime, where perturbations fall below an input-dependent threshold and vanish, resulting in constant outputs; 2) a chaotic regime, where rounding errors dominate and drive output divergence; and 3) a signal-dominated regime, where true input variations override numerical noise. We validate these findings extensively across multiple datasets and model architectures.
LGDec 16, 2023Code
Inductive Link Prediction in Knowledge Graphs using Path-based Neural NetworksCanlin Zhang, Xiuwen Liu
Link prediction is a crucial research area in knowledge graphs, with many downstream applications. In many real-world scenarios, inductive link prediction is required, where predictions have to be made among unseen entities. Embedding-based models usually need fine-tuning on new entity embeddings, and hence are difficult to be directly applied to inductive link prediction tasks. Logical rules captured by rule-based models can be directly applied to new entities with the same graph typologies, but the captured rules are discrete and usually lack generosity. Graph neural networks (GNNs) can generalize topological information to new graphs taking advantage of deep neural networks, which however may still need fine-tuning on new entity embeddings. In this paper, we propose SiaILP, a path-based model for inductive link prediction using siamese neural networks. Our model only depends on relation and path embeddings, which can be generalized to new entities without fine-tuning. Experiments show that our model achieves several new state-of-the-art performances in link prediction tasks using inductive versions of WN18RR, FB15k-237, and Nell995. Our code is available at \url{https://github.com/canlinzhang/SiaILP}.
CVFeb 11, 2025Code
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation VulnerabilitiesChashi Mahiul Islam, Samuel Jacob Chacko, Preston Horne et al.
Multimodal Large Language Models (MLLMs) represent the cutting edge of AI technology, with DeepSeek models emerging as a leading open-source alternative offering competitive performance to closed-source systems. While these models demonstrate remarkable capabilities, their vision-language integration mechanisms introduce specific vulnerabilities. We implement an adapted embedding manipulation attack on DeepSeek Janus that induces targeted visual hallucinations through systematic optimization of image embeddings. Through extensive experimentation across COCO, DALL-E 3, and SVIT datasets, we achieve hallucination rates of up to 98.0% while maintaining high visual fidelity (SSIM > 0.88) of the manipulated images on open-ended questions. Our analysis demonstrates that both 1B and 7B variants of DeepSeek Janus are susceptible to these attacks, with closed-form evaluation showing consistently higher hallucination rates compared to open-ended questioning. We introduce a novel multi-prompt hallucination detection framework using LLaMA-3.1 8B Instruct for robust evaluation. The implications of these findings are particularly concerning given DeepSeek's open-source nature and widespread deployment potential. This research emphasizes the critical need for embedding-level security measures in MLLM deployment pipelines and contributes to the broader discussion of responsible AI implementation.
CVJul 1, 2024
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal ModelsShaeke Salman, Md Montasir Bin Shams, Xiuwen Liu
Utilizing a shared embedding space, emerging multimodal models exhibit unprecedented zero-shot capabilities. However, the shared embedding space could lead to new vulnerabilities if different modalities can be misaligned. In this paper, we extend and utilize a recently developed effective gradient-based procedure that allows us to match the embedding of a given text by minimally modifying an image. Using the procedure, we show that we can align the embeddings of distinguishable texts to any image through unnoticeable adversarial attacks in joint image-text models, revealing that semantically unrelated images can have embeddings of identical texts and at the same time visually indistinguishable images can be matched to the embeddings of very different texts. Our technique achieves 100\% success rate when it is applied to text datasets and images from multiple sources. Without overcoming the vulnerability, multimodal models cannot robustly align inputs from different modalities in a semantically meaningful way. \textbf{Warning: the text data used in this paper are toxic in nature and may be offensive to some readers.}
LGMay 14, 2025Code
Adversarial Attack on Large Language Models using Exponentiated Gradient DescentSajib Biswas, Mao Nishino, Samuel Jacob Chacko et al.
As Large Language Models (LLMs) are widely used, understanding them systematically is key to improving their safety and realizing their full potential. Although many models are aligned using techniques such as reinforcement learning from human feedback (RLHF), they are still vulnerable to jailbreaking attacks. Some of the existing adversarial attack methods search for discrete tokens that may jailbreak a target model while others try to optimize the continuous space represented by the tokens of the model's vocabulary. While techniques based on the discrete space may prove to be inefficient, optimization of continuous token embeddings requires projections to produce discrete tokens, which might render them ineffective. To fully utilize the constraints and the structures of the space, we develop an intrinsic optimization technique using exponentiated gradient descent with the Bregman projection method to ensure that the optimized one-hot encoding always stays within the probability simplex. We prove the convergence of the technique and implement an efficient algorithm that is effective in jailbreaking several widely used LLMs. We demonstrate the efficacy of the proposed technique using five open-source LLMs on four openly available datasets. The results show that the technique achieves a higher success rate with great efficiency compared to three other state-of-the-art jailbreaking techniques. The source code for our implementation is available at: https://github.com/sbamit/Exponentiated-Gradient-Descent-LLM-Attack
LGAug 20, 2025Code
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient DescentSajib Biswas, Mao Nishino, Samuel Jacob Chacko et al.
As large language models (LLMs) are increasingly deployed in critical applications, ensuring their robustness and safety alignment remains a major challenge. Despite the overall success of alignment techniques such as reinforcement learning from human feedback (RLHF) on typical prompts, LLMs remain vulnerable to jailbreak attacks enabled by crafted adversarial triggers appended to user prompts. Most existing jailbreak methods either rely on inefficient searches over discrete token spaces or direct optimization of continuous embeddings. While continuous embeddings can be given directly to selected open-source models as input, doing so is not feasible for proprietary models. On the other hand, projecting these embeddings back into valid discrete tokens introduces additional complexity and often reduces attack effectiveness. We propose an intrinsic optimization method which directly optimizes relaxed one-hot encodings of the adversarial suffix tokens using exponentiated gradient descent coupled with Bregman projection, ensuring that the optimized one-hot encoding of each token always remains within the probability simplex. We provide theoretical proof of convergence for our proposed method and implement an efficient algorithm that effectively jailbreaks several widely used LLMs. Our method achieves higher success rates and faster convergence compared to three state-of-the-art baselines, evaluated on five open-source LLMs and four adversarial behavior datasets curated for evaluating jailbreak methods. In addition to individual prompt attacks, we also generate universal adversarial suffixes effective across multiple prompts and demonstrate transferability of optimized suffixes to different LLMs.
63.2CRMar 23
STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF SolvingJames Hugglestone, Samuel Jacob Chacko, Dawson Stoller et al.
Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing research often relies on static benchmarks that fail to capture the dynamic nature of real-world vulnerabilities. In this work, we introduce STRIATUM-CTF (A Search-based Test-time Reasoning Inference Agent for Tactical Utility Maximization in Cybersecurity), a modular agentic framework built upon the Model Context Protocol (MCP). By standardizing tool interfaces for system introspection, decompilation, and runtime debugging, STRIATUM-CTF enables the agent to maintain a coherent context window across extended exploit trajectories. We validate this approach not merely on synthetic datasets, but in a live competitive environment. Our system participated in a university-hosted Capture-the-Flag (CTF) competition in late 2025, where it operated autonomously to identify and exploit vulnerabilities in real-time. STRIATUM-CTF secured First Place, outperforming 21 human teams and demonstrating strong adaptability in a dynamic problem-solving setting. We analyze the agent's decision-making logs to show how MCP-based tool abstraction significantly reduces hallucination compared to naive prompting strategies. These results suggest that standardized context protocols are a critical path toward robust autonomous cyber-reasoning systems.
LGOct 9, 2023
Nonlinear Correct and Smooth for Semi-Supervised LearningYuanhang Shao, Xiuwen Liu
Graph-based semi-supervised learning (GSSL) has been used successfully in various applications. Existing methods leverage the graph structure and labeled samples for classification. Label Propagation (LP) and Graph Neural Networks (GNNs) both iteratively pass messages on graphs, where LP propagates node labels through edges and GNN aggregates node features from the neighborhood. Recently, combining LP and GNN has led to improved performance. However, utilizing labels and features jointly in higher-order graphs has not been explored. Therefore, we propose Nonlinear Correct and Smooth (NLCS), which improves the existing post-processing approach by incorporating non-linearity and higher-order representation into the residual propagation to handle intricate node relationships effectively. Systematic evaluations show that our method achieves remarkable average improvements of 13.71% over base prediction and 2.16% over the state-of-the-art post-processing method on six commonly used datasets. Comparisons and analyses show our method effectively utilizes labels and features jointly in higher-order graphs to resolve challenging graph relationships.
LGOct 24, 2024
Adversarial Attacks on Large Language Models Using Regularized RelaxationSamuel Jacob Chacko, Sajib Biswas, Chashi Mahiul Islam et al.
As powerful Large Language Models (LLMs) are now widely used for numerous practical applications, their safety is of critical importance. While alignment techniques have significantly improved overall safety, LLMs remain vulnerable to carefully crafted adversarial inputs. Consequently, adversarial attack methods are extensively used to study and understand these vulnerabilities. However, current attack methods face significant limitations. Those relying on optimizing discrete tokens suffer from limited efficiency, while continuous optimization techniques fail to generate valid tokens from the model's vocabulary, rendering them impractical for real-world applications. In this paper, we propose a novel technique for adversarial attacks that overcomes these limitations by leveraging regularized gradients with continuous optimization methods. Our approach is two orders of magnitude faster than the state-of-the-art greedy coordinate gradient-based method, significantly improving the attack success rate on aligned language models. Moreover, it generates valid tokens, addressing a fundamental limitation of existing continuous optimization methods. We demonstrate the effectiveness of our attack on five state-of-the-art LLMs using four datasets.
CVJan 28, 2024
Intriguing Equivalence Structures of the Embedding Space of Vision TransformersShaeke Salman, Md Montasir Bin Shams, Xiuwen Liu
Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood. While small adversarial inputs to such models are well known, the structures of the representation space are not well characterized despite their fundamental importance. In this paper, using the vision transformers as an example due to the continuous nature of their input space, we show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces where there exist very different inputs sharing the same representations, and at the same time, local normal spaces where there are visually indistinguishable inputs having very different representations. The empirical results are further verified using the local directional estimations of the Lipschitz constants of the underlying models. Consequently, the resulting representations change the results of downstream models, and such models are subject to overgeneralization and with limited semantically meaningful generalization capability.
CVFeb 7, 2025
Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision TransformersChashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino et al.
While transformer-based models dominate NLP and vision applications, their underlying mechanisms to map the input space to the label space semantically are not well understood. In this paper, we study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations and semantically unrelated images can have the same representation. Our analysis indicates that imperceptible changes to the input can result in significant representation changes, particularly in later layers, suggesting potential instabilities in the performance of ViTs. Our comprehensive study reveals that adversarial effects, while subtle in early layers, propagate and amplify through the network, becoming most pronounced in middle to late layers. This insight motivates the development of NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects. We demonstrate NeuroShield-ViT's effectiveness across various attacks, particularly excelling against strong iterative attacks, and showcase its remarkable zero-shot generalization capabilities. Without fine-tuning, our method achieves a competitive accuracy of 77.8% on adversarial examples, surpassing conventional robustness methods. Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks. Additionally, they provide a promising approach to enhance the robustness of vision transformers against adversarial attacks.
LGDec 21, 2024
Data-Driven Fairness Generalization for Deepfake DetectionUzoamaka Ezeakunne, Chrisantus Eze, Xiuwen Liu
Despite the progress made in deepfake detection research, recent studies have shown that biases in the training data for these detectors can result in varying levels of performance across different demographic groups, such as race and gender. These disparities can lead to certain groups being unfairly targeted or excluded. Traditional methods often rely on fair loss functions to address these issues, but they under-perform when applied to unseen datasets, hence, fairness generalization remains a challenge. In this work, we propose a data-driven framework for tackling the fairness generalization problem in deepfake detection by leveraging synthetic datasets and model optimization. Our approach focuses on generating and utilizing synthetic data to enhance fairness across diverse demographic groups. By creating a diverse set of synthetic samples that represent various demographic groups, we ensure that our model is trained on a balanced and representative dataset. This approach allows us to generalize fairness more effectively across different domains. We employ a comprehensive strategy that leverages synthetic data, a loss sharpness-aware optimization pipeline, and a multi-task learning framework to create a more equitable training environment, which helps maintain fairness across both intra-dataset and cross-dataset evaluations. Extensive experiments on benchmark deepfake detection datasets demonstrate the efficacy of our approach, surpassing state-of-the-art approaches in preserving fairness during cross-dataset evaluation. Our results highlight the potential of synthetic datasets in achieving fairness generalization, providing a robust solution for the challenges faced in deepfake detection.
CVFeb 13, 2024
Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer ModelsShaeke Salman, Md Montasir Bin Shams, Xiuwen Liu et al.
Transformer-based models have dominated natural language processing and other areas in the last few years due to their superior (zero-shot) performance on benchmark datasets. However, these models are poorly understood due to their complexity and size. While probing-based methods are widely used to understand specific properties, the structures of the representation space are not systematically characterized; consequently, it is unclear how such models generalize and overgeneralize to new inputs beyond datasets. In this paper, based on a new gradient descent optimization method, we are able to explore the embedding space of a commonly used vision-language model. Using the Imagenette dataset, we show that while the model achieves over 99\% zero-shot classification performance, it fails systematic evaluations completely. Using a linear approximation, we provide a framework to explain the striking differences. We have also obtained similar results using a different model to support that our results are applicable to other transformer models with continuous inputs. We also propose a robust way to detect the modified images.
CVOct 3, 2025
Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task LearningChashi Mahiul Islam, Oteo Mamo, Samuel Jacob Chacko et al.
Vision-language models (VLMs) have advanced multimodal reasoning but still face challenges in spatial reasoning for 3D scenes and complex object configurations. To address this, we introduce SpatialViLT, an enhanced VLM that integrates spatial features like depth maps, 3D coordinates, and edge maps through a multi-task learning framework. This approach enriches multimodal embeddings with spatial understanding. We propose two variants: SpatialViLT and MaskedSpatialViLT, focusing on full and masked object regions, respectively. Additionally, SpatialEnsemble combines both approaches, achieving state-of-the-art accuracy. Our models excel in spatial reasoning categories such as directional, topological, and proximity relations, as demonstrated on the challenging Visual Spatial Reasoning (VSR) dataset. This work represents a significant step in enhancing the spatial intelligence of AI systems, crucial for advanced multimodal understanding and real-world applications.
CVJul 2, 2025
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical ImagingMontasir Shams, Chashi Mahiul Islam, Shaeke Salman et al.
Vision transformers (ViTs) have rapidly gained prominence in medical imaging tasks such as disease classification, segmentation, and detection due to their superior accuracy compared to conventional deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful. In this paper, using a projected gradient-based algorithm, we show that their representations are not semantically meaningful and they are inherently vulnerable to small changes. Images with imperceptible differences can have very different representations; on the other hand, images that should belong to different semantic classes can have nearly identical representations. Such vulnerability can lead to unreliable classification results; for example, unnoticeable changes cause the classification accuracy to be reduced by over 60\%. %. To the best of our knowledge, this is the first work to systematically demonstrate this fundamental lack of semantic meaningfulness in ViT representations for medical image classification, revealing a critical challenge for their deployment in safety-critical systems.
LGApr 1, 2025
A Theory of Machine Understanding via the Minimum Description Length PrincipleCanlin Zhang, Xiuwen Liu
Deep neural networks trained through end-to-end learning have achieved remarkable success across various domains in the past decade. However, the end-to-end learning strategy, originally designed to minimize predictive loss in a black-box manner, faces two fundamental limitations: the struggle to form explainable representations in a self-supervised manner, and the inability to compress information rigorously following the Minimum Description Length (MDL) principle. These two limitations point to a deeper issue: an end-to-end learning model is not able to "understand" what it learns. In this paper, we establish a novel theory connecting these two limitations. We design the Spectrum VAE, a novel deep learning architecture whose minimum description length (MDL) can be rigorously evaluated. Then, we introduce the concept of latent dimension combinations, or what we term spiking patterns, and demonstrate that the observed spiking patterns should be as few as possible based on the training data in order for the Spectrum VAE to achieve the MDL. Finally, our theory demonstrates that when the MDL is achieved with respect to the given data distribution, the Spectrum VAE will naturally produce explainable latent representations of the data. In other words, explainable representations--or "understanding"--can emerge in a self-supervised manner simply by making the deep network obey the MDL principle. In our opinion, this also implies a deeper insight: To understand is to compress. At its core, our theory advocates for a shift in the training objective of deep networks: not only to minimize predictive loss, but also to minimize the description length regarding the given data. That is, a deep network should not only learn, but also understand what it learns. This work is entirely theoretical and aims to inspire future research toward self-supervised, explainable AI grounded in the MDL principle.
LGMay 19, 2024
Learning Regularities from Data using Spiking Functions: A TheoryCanlin Zhang, Xiuwen Liu
Deep neural networks trained in an end-to-end manner are proven to be efficient in a wide range of machine learning tasks. However, there is one drawback of end-to-end learning: The learned features and information are implicitly represented in neural network parameters, which cannot be used as regularities, concepts or knowledge to explicitly represent the data probability distribution. To resolve this issue, we propose in this paper a new machine learning theory, which defines in mathematics what are regularities. Briefly, regularities are concise representations of the non-random features, or 'non-randomness' in the data probability distribution. Combining this with information theory, we claim that regularities can also be regarded as a small amount of information encoding a large amount of information. Our theory is based on spiking functions. That is, if a function can react to, or spike on specific data samples more frequently than random noise inputs, we say that such a function discovers non-randomness from the data distribution. Also, we say that the discovered non-randomness is encoded into regularities if the function is simple enough. Our theory also discusses applying multiple spiking functions to the same data distribution. In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information from the data distribution, and then encode the captured information in the most concise way. Theorems and hypotheses are provided to describe in mathematics what are 'best' regularities and optimal spiking functions. Finally, we propose a machine learning approach, which can potentially obtain the optimal spiking functions regarding the given dataset in practice.
CLApr 22, 2020
Dense Embeddings Preserving the Semantic Relationships in WordNetCanlin Zhang, Xiuwen Liu
In this paper, we provide a novel way to generate low dimensional vector embeddings for the noun and verb synsets in WordNet, where the hypernym-hyponym relationship is preserved in the embeddings. We call this embedding the Sense Spectrum (and Sense Spectra for embeddings). In order to create suitable labels for the training of sense spectra, we designed a new similarity measurement for noun and verb synsets in WordNet. We call this similarity measurement the Hypernym Intersection Similarity (HIS), since it compares the common and unique hypernyms between two synsets. Our experiments show that on the noun and verb pairs of the SimLex-999 dataset, HIS outperforms the three similarity measurements in WordNet. Moreover, to the best of our knowledge, the sense spectra provide the first dense synset embeddings that preserve the semantic relationships in WordNet.
CLMar 18, 2020
An Analysis on the Learning Rules of the Skip-Gram ModelCanlin Zhang, Xiuwen Liu, Daniel Bis
To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.
LGOct 18, 2019
Towards Quantifying Intrinsic Generalization of Deep ReLU NetworksShaeke Salman, Canlin Zhang, Xiuwen Liu et al.
Understanding the underlying mechanisms that enable the empirical successes of deep neural networks is essential for further improving their performance and explaining such networks. Towards this goal, a specific question is how to explain the "surprising" behavior of the same over-parametrized deep neural networks that can generalize well on real datasets and at the same time "memorize" training samples when the labels are randomized. In this paper, we demonstrate that deep ReLU networks generalize from training samples to new points via piece-wise linear interpolation. We provide a quantified analysis on the generalization ability of a deep ReLU network: Given a fixed point $\mathbf{x}$ and a fixed direction in the input space $\mathcal{S}$, there is always a segment such that any point on the segment will be classified the same as the fixed point $\mathbf{x}$. We call this segment the $generalization \ interval$. We show that the generalization intervals of a ReLU network behave similarly along pairwise directions between samples of the same label in both real and random cases on the MNIST and CIFAR-10 datasets. This result suggests that the same interpolation mechanism is used in both cases. Additionally, for datasets using real labels, such networks provide a good approximation of the underlying manifold in the data, where the changes are much smaller along tangent directions than along normal directions. On the other hand, however, for datasets with random labels, generalization intervals along mid-lines of triangles with the same label are much smaller than those on the datasets with real labels, suggesting different behaviors along other directions. Our systematic experiments demonstrate for the first time that such deep neural networks generalize through the same interpolation and explain the differences between their performance on datasets with real and random labels.
LGMay 14, 2019
Consensus-based Interpretable Deep Neural Networks with Application to Mortality PredictionShaeke Salman, Seyedeh Neelufar Payrovnaziri, Xiuwen Liu et al.
Deep neural networks have achieved remarkable success in various challenging tasks. However, the black-box nature of such networks is not acceptable to critical applications, such as healthcare. In particular, the existence of adversarial examples and their overgeneralization to irrelevant, out-of-distribution inputs with high confidence makes it difficult, if not impossible, to explain decisions by such networks. In this paper, we analyze the underlying mechanism of generalization of deep neural networks and propose an ($n$, $k$) consensus algorithm which is insensitive to adversarial examples and can reliably reject out-of-distribution samples. Furthermore, the consensus algorithm is able to improve classification accuracy by using multiple trained deep neural networks. To handle the complexity of deep neural networks, we cluster linear approximations of individual models and identify highly correlated clusters among different models to capture feature importance robustly, resulting in improved interpretability. Motivated by the importance of building accurate and interpretable prediction models for healthcare, our experimental results on an ICU dataset show the effectiveness of our algorithm in enhancing both the prediction accuracy and the interpretability of deep neural network models on one-year patient mortality prediction. In particular, while the proposed method maintains similar interpretability as conventional shallow models such as logistic regression, it improves the prediction accuracy significantly.
SIFeb 3, 2019
High-resolution home location prediction from tweets using deep learning with dynamic structureMeysam Ghaffari, Ashok Srinivasan, Xiuwen Liu
Timely and high-resolution estimates of the home locations of a sufficiently large subset of the population are critical for effective disaster response and public health intervention, but this is still an open problem. Conventional data sources, such as census and surveys, have a substantial time lag and cannot capture seasonal trends. Recently, social media data has been exploited to address this problem by leveraging its large user-base and real-time nature. However, inherent sparsity and noise, along with large estimation uncertainty in home locations, have limited their effectiveness. Consequently, much of previous research has aimed only at a coarse spatial resolution, with accuracy being limited for high-resolution methods. In this paper, we develop a deep-learning solution that uses a two-phase dynamic structure to deal with sparse and noisy social media data. In the first phase, high recall is achieved using a random forest, producing more balanced home location candidates. Then two deep neural networks are used to detect home locations with high accuracy. We obtained over 90% accuracy for large subsets on a commonly used dataset. Compared to other high-resolution methods, our approach yields up to 60% error reduction by reducing high-resolution home prediction error from over 21% to less than 8%. Systematic comparisons show that our method gives the highest accuracy both for the entire sample and for subsets. Evaluation on a real-world public health problem further validates the effectiveness of our approach.
LGJan 19, 2019
Overfitting Mechanism and Avoidance in Deep Neural NetworksShaeke Salman, Xiuwen Liu
Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and natural language processing. As they are being used in critical applications, understanding underlying mechanisms for their successes and limitations is imperative. In this paper, we show that overfitting, one of the fundamental issues in deep neural networks, is due to continuous gradient updating and scale sensitiveness of cross entropy loss. By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct ones and increases in the incorrect ones. Furthermore, by analyzing dynamics during training, we propose a consensus-based classification algorithm that enables us to avoid overfitting and significantly improve the classification accuracy especially when the number of training samples is limited. As each trained neural network depends on extrinsic factors such as initial values as well as training data, requiring consensus among multiple models reduces extrinsic factors substantially; for statistically independent models, the reduction is exponential. Compared to ensemble algorithms, the proposed algorithm avoids overgeneralization by not classifying ambiguous inputs. Systematic experimental results demonstrate the effectiveness of the proposed algorithm. For example, using only 1000 training samples from MNIST dataset, the proposed algorithm achieves 95% accuracy, significantly higher than any of the individual models, with 90% of the test samples classified.
CVAug 2, 2017
Land Cover Classification from Multi-temporal, Multi-spectral Remotely Sensed Imagery using Patch-Based Recurrent Neural NetworksAtharva Sharma, Xiuwen Liu, Xiaojun Yang
Sustainability of the global environment is dependent on the accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing land cover datasets were derived from a pixel-based single-date multi-spectral remotely sensed image with low accuracy. To improve the accuracy, the bottleneck is how to develop an accurate and effective image classification technique. By incorporating and utilizing the complete multi-spectral, multi-temporal and spatial information in remote sensing images and considering their inherit spatial and sequential interdependence, we propose a new patch-based RNN (PB-RNN) system tailored for multi-temporal remote sensing data. The system is designed by incorporating distinctive characteristics in multi-temporal remote sensing data. In particular, it uses multi-temporal-spectral-spatial samples and deals with pixels contaminated by clouds/shadow present in the multi-temporal data series. Using a Florida Everglades ecosystem study site covering an area of 771 square kilo-meters, the proposed PB-RNN system has achieved a significant improvement in the classification accuracy over pixel-based RNN system, pixel-based single-imagery NN system, pixel-based multi-images NN system, patch-based single-imagery NN system and patch-based multi-images NN system. For example, the proposed system achieves 97.21% classification accuracy while a pixel-based single-imagery NN system achieves 64.74%. By utilizing methods like the proposed PB-RNN one, we believe that much more accurate land cover datasets can be produced over large areas efficiently.
CVDec 22, 2015
Transformed Residual Quantization for Approximate Nearest Neighbor SearchJiangbo Yuan, Xiuwen Liu
The success of product quantization (PQ) for fast nearest neighbor search depends on the exponentially reduced complexities of both storage and computation with respect to the codebook size. Recent efforts have been focused on employing sophisticated optimization strategies, or seeking more effective models. Residual quantization (RQ) is such an alternative that holds the same property as PQ in terms of the aforementioned complexities. In addition to being a direct replacement of PQ, hybrids of PQ and RQ can yield more gains for approximate nearest neighbor search. This motivated us to propose a novel approach to optimizing RQ and the related hybrid models. With an observation of the general randomness increase in a residual space, we propose a new strategy that jointly learns a local transformation per residual cluster with an ultimate goal to reduce overall quantization errors. We have shown that our approach can achieve significantly better accuracy on nearest neighbor search than both the original and the optimized PQ on several very large scale benchmarks.