LGOct 12, 2023
Lag-Llama: Towards Foundation Models for Probabilistic Time Series ForecastingKashif Rasul, Arjun Ashok, Andrew Robert Williams et al.
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.
LGAug 15, 2024
Machine learning empowered Modulation detection for OFDM-based signalsAli Pourranjbar, Georges Kaddoum, Verdier Assoume Mba et al.
We propose a blind ML-based modulation detection for OFDM-based technologies. Unlike previous works that assume an ideal environment with precise knowledge of subcarrier count and cyclic prefix location, we consider blind modulation detection while accounting for realistic environmental parameters and imperfections. Our approach employs a ResNet network to simultaneously detect the modulation type and accurately locate the cyclic prefix. Specifically, after eliminating the environmental impact from the signal and accurately extracting the OFDM symbols, we convert these symbols into scatter plots. Due to their unique shapes, these scatter plots are then classified using ResNet. As a result, our proposed modulation classification method can be applied to any OFDM-based technology without prior knowledge of the transmitted signal. We evaluate its performance across various modulation schemes and subcarrier numbers. Simulation results show that our method achieves a modulation detection accuracy exceeding $80\%$ at an SNR of $10$ dB and $95\%$ at an SNR of $25$ dB.
LGFeb 5, 2024
Empowering Time Series Analysis with Large Language Models: A SurveyYushan Jiang, Zijie Pan, Xikun Zhang et al.
Recently, remarkable progress has been made over large language models (LLMs), demonstrating their unprecedented capability in varieties of natural language tasks. However, completely training a large general-purpose model from the scratch is challenging for time series analysis, due to the large volumes and varieties of time series data, as well as the non-stationarity that leads to concept drift impeding continuous model adaptation and re-training. Recent advances have shown that pre-trained LLMs can be exploited to capture complex dependencies in time series data and facilitate various applications. In this survey, we provide a systematic overview of existing methods that leverage LLMs for time series analysis. Specifically, we first state the challenges and motivations of applying language models in the context of time series as well as brief preliminaries of LLMs. Next, we summarize the general pipeline for LLM-based time series analysis, categorize existing methods into different groups (i.e., direct query, tokenization, prompt design, fine-tune, and model integration), and highlight the key ideas within each group. We also discuss the applications of LLMs for both general and spatial-temporal time series data, tailored to specific domains. Finally, we thoroughly discuss future research opportunities to empower time series analysis with LLMs.
AIJan 29
JAF: Judge Agent ForestSahil Garg, Brad Cheezum, Sridhar Dutta et al.
Judge agents are fundamental to agentic AI frameworks: they provide automated evaluation, and enable iterative self-refinement of reasoning processes. We introduce JAF: Judge Agent Forest, a framework in which the judge agent conducts joint inference across a cohort of query--response pairs generated by a primary agent, rather than evaluating each in isolation. This paradigm elevates the judge from a local evaluator to a holistic learner: by simultaneously assessing related responses, the judge discerns cross-instance patterns and inconsistencies, whose aggregate feedback enables the primary agent to improve by viewing its own outputs through the judge's collective perspective. Conceptually, JAF bridges belief propagation and ensemble-learning principles: overlapping in-context neighborhoods induce a knowledge-graph structure that facilitates propagation of critique, and repeated, randomized evaluations yield a robust ensemble of context-sensitive judgments. JAF can be instantiated entirely via ICL, with the judge prompted for each query using its associated primary-agent response plus a small, possibly noisy set of peer exemplars. While kNN in embedding space is a natural starting point for exemplars, this approach overlooks categorical structure, domain metadata, or nuanced distinctions accessible to modern LLMs. To overcome these limitations, we develop a flexible locality-sensitive hashing (LSH) algorithm that learns informative binary codes by integrating semantic embeddings, LLM-driven hash predicates, supervision from categorical labels, and relevant side information. These hash codes support efficient, interpretable, and relation-aware selection of diverse exemplars, and further optimize exploration of CoT reasoning paths. We validate JAF with an empirical study on the demanding task of cloud misconfigs triage in large-scale cloud environments.
LGMar 9, 2024
$\textbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series ForecastingZijie Pan, Yushan Jiang, Sahil Garg et al.
Recently, there has been a growing interest in leveraging pre-trained large language models (LLMs) for various time series applications. However, the semantic space of LLMs, established through the pre-training, is still underexplored and may help yield more distinctive and informative representations to facilitate time series forecasting. To this end, we propose Semantic Space Informed Prompt learning with LLM ($S^2$IP-LLM) to align the pre-trained semantic space with time series embeddings space and perform time series forecasting based on learned prompts from the joint space. We first design a tokenization module tailored for cross-modality alignment, which explicitly concatenates patches of decomposed time series components to create embeddings that effectively encode the temporal dynamics. Next, we leverage the pre-trained word token embeddings to derive semantic anchors and align selected anchors with time series embeddings by maximizing the cosine similarity in the joint space. This way, $S^2$IP-LLM can retrieve relevant semantic anchors as prompts to provide strong indicators (context) for time series that exhibit different temporal dynamics. With thorough empirical studies on multiple benchmark datasets, we demonstrate that the proposed $S^2$IP-LLM can achieve superior forecasting performance over state-of-the-art baselines. Furthermore, our ablation studies and visualizations verify the necessity of prompt learning informed by semantic space.
LGFeb 20, 2024
Structural Knowledge Informed Continual Multivariate Time Series ForecastingZijie Pan, Yushan Jiang, Dongjin Song et al.
Recent studies in multivariate time series (MTS) forecasting reveal that explicitly modeling the hidden dependencies among different time series can yield promising forecasting performance and reliable explanations. However, modeling variable dependencies remains underexplored when MTS is continuously accumulated under different regimes (stages). Due to the potential distribution and dependency disparities, the underlying model may encounter the catastrophic forgetting problem, i.e., it is challenging to memorize and infer different types of variable dependencies across different regimes while maintaining forecasting performance. To address this issue, we propose a novel Structural Knowledge Informed Continual Learning (SKI-CL) framework to perform MTS forecasting within a continual learning paradigm, which leverages structural knowledge to steer the forecasting model toward identifying and adapting to different regimes, and selects representative MTS samples from each regime for memory replay. Specifically, we develop a forecasting model based on graph structure learning, where a consistency regularization scheme is imposed between the learned variable dependencies and the structural knowledge while optimizing the forecasting objective over the MTS data. As such, MTS representations learned in each regime are associated with distinct structural knowledge, which helps the model memorize a variety of conceivable scenarios and results in accurate forecasts in the continual learning context. Meanwhile, we develop a representation-matching memory replay scheme that maximizes the temporal coverage of MTS data to efficiently preserve the underlying temporal dynamics and dependency structures of each regime. Thorough empirical studies on synthetic and real-world benchmarks validate SKI-CL's efficacy and advantages over the state-of-the-art for continual MTS forecasting tasks.
CRDec 5, 2025
The Road of Adaptive AI for Precision in CybersecuritySahil Garg
Cybersecurity's evolving complexity presents unique challenges and opportunities for AI research and practice. This paper shares key lessons and insights from designing, building, and operating production-grade GenAI pipelines in cybersecurity, with a focus on the continual adaptation required to keep pace with ever-shifting knowledge bases, tooling, and threats. Our goal is to provide an actionable perspective for AI practitioners and industry stakeholders navigating the frontier of GenAI for cybersecurity, with particular attention to how different adaptation mechanisms complement each other in end-to-end systems. We present practical guidance derived from real-world deployments, propose best practices for leveraging retrieval- and model-level adaptation, and highlight open research directions for making GenAI more robust, precise, and auditable in cyber defense.
LGApr 10, 2024
Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AISahil Garg, Anderson Schneider, Anant Raj et al.
Building on the remarkable achievements in generative sampling of natural images, we propose an innovative challenge, potentially overly ambitious, which involves generating samples of entire multivariate time series that resemble images. However, the statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects. This issue is especially problematic for deep generative models that follow the conventional approach of generating samples from a canonical distribution and then decoding or denoising them to match the true data distribution. In contrast, our method is grounded in information theory and aims to implicitly characterize the distribution of images, particularly the (global and local) dependency structure between pixels. We achieve this by empirically estimating its KL-divergence in the dual form with respect to the respective marginal distribution. This enables us to perform generative sampling directly in the optimized 1-D dual divergence space. Specifically, in the dual space, training samples representing the data distribution are embedded in the form of various clusters between two end points. In theory, any sample embedded between those two end points is in-distribution w.r.t. the data distribution. Our key idea for generating novel samples of images is to interpolate between the clusters via a walk as per gradients of the dual function w.r.t. the data dimensions. In addition to the data efficiency gained from direct sampling, we propose an algorithm that offers a significant reduction in sample complexity for estimating the divergence of the data distribution with respect to the marginal distribution. We provide strong theoretical guarantees along with an extensive empirical evaluation using many real-world datasets from diverse domains, establishing the superiority of our approach w.r.t. state-of-the-art deep learning methods.
CLMar 21, 2021
Structural block driven - enhanced convolutional neural representation for relation extractionDongsheng Wang, Prayag Tiwari, Sahil Garg et al.
In this paper, we propose a novel lightweight relation extraction approach of structural block driven - convolutional neural learning. Specifically, we detect the essential sequential tokens associated with entities through dependency analysis, named as a structural block, and only encode the block on a block-wise and an inter-block-wise representation, utilizing multi-scale CNNs. This is to 1) eliminate the noisy from irrelevant part of a sentence; meanwhile 2) enhance the relevant block representation with both block-wise and inter-block-wise semantically enriched representation. Our method has the advantage of being independent of long sentence context since we only encode the sequential tokens within a block boundary. Experiments on two datasets i.e., SemEval2010 and KBP37, demonstrate the significant advantages of our method. In particular, we achieve the new state-of-the-art performance on the KBP37 dataset; and comparable performance with the state-of-the-art on the SemEval2010 dataset.
LGJul 19, 2020
Deep Anomaly Detection for Time-series Data in Industrial IoT: A Communication-Efficient On-device Federated Learning ApproachYi Liu, Sahil Garg, Jiangtian Nie et al.
Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recent years. With this focus, this paper proposes a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT. Specifically, we first introduce a FL framework to enable decentralized edge devices to collaboratively train an anomaly detection model, which can improve its generalization ability. Second, we propose an Attention Mechanism-based Convolutional Neural Network-Long Short Term Memory (AMCNN-LSTM) model to accurately detect anomalies. The AMCNN-LSTM model uses attention mechanism-based CNN units to capture important fine-grained features, thereby preventing memory loss and gradient dispersion problems. Furthermore, this model retains the advantages of LSTM unit in predicting time series data. Third, to adapt the proposed framework to the timeliness of industrial anomaly detection, we propose a gradient compression mechanism based on Top-\textit{k} selection to improve communication efficiency. Extensive experiment studies on four real-world datasets demonstrate that the proposed framework can accurately and timely detect anomalies and also reduce the communication overhead by 50\% compared to the federated learning framework that does not use a gradient compression scheme.
LGApr 21, 2020
An RNN-Survival Model to Decide Email Send TimesHarvineet Singh, Moumita Sinha, Atanu R. Sinha et al.
Email communications are ubiquitous. Firms control send times of emails and thereby the instants at which emails reach recipients (it is assumed email is received instantaneously from the send time). However, they do not control the duration it takes for recipients to open emails, labeled as time-to-open. Importantly, among emails that are opened, most occur within a short window from their send times. We posit that emails are likely to be opened sooner when send times are convenient for recipients, while for other send times, emails can get ignored. Thus, to compute appropriate send times it is important to predict times-to-open accurately. We propose a recurrent neural network (RNN) in a survival model framework to predict times-to-open, for each recipient. Using that we compute appropriate send times. We experiment on a data set of emails sent to a million customers over five months. The sequence of emails received by a person from a sender is a result of interactions with past emails from the sender, and hence contain useful signal that inform our model. This sequential dependence affords our proposed RNN-Survival (RNN-S) approach to outperform survival analysis approaches in predicting times-to-open. We show that best times to send emails can be computed accurately from predicted times-to-open. This approach allows a firm to tune send times of emails, which is in its control, to favorably influence open rates and engagement.
LGSep 9, 2019
Nearly-Unsupervised Hashcode Representations for Relation ExtractionSahil Garg, Aram Galstyan, Greg Ver Steeg et al.
Recently, kernelized locality sensitive hashcodes have been successfully employed as representations of natural language text, especially showing high relevance to biomedical relation extraction tasks. In this paper, we propose to optimize the hashcode representations in a nearly unsupervised manner, in which we only use data points, but not their class labels, for learning. The optimized hashcode representations are then fed to a supervised classifier following the prior work. This nearly unsupervised approach allows fine-grained optimization of each hash function, which is particularly suitable for building hashcode representations generalizing from a training set to a test set. We empirically evaluate the proposed approach for biomedical relation extraction tasks, obtaining significant accuracy improvements w.r.t. state-of-the-art supervised and semi-supervised approaches.
NIJul 21, 2019
LiSA: A Lightweight and Secure Authentication Mechanism for Smart Metering InfrastructureSahil Garg, Kuljeet Kaur, Georges Kaddoum et al.
Smart metering infrastructure (SMI) is the core component of the smart grid (SG) which enables two-way communication between consumers and utility companies to control, monitor, and manage the energy consumption data. Despite their salient features, SMIs equipped with information and communication technology are associated with new threats due to their dependency on public communication networks. Therefore, the security of SMI communications raises the need for robust authentication and key agreement primitives that can satisfy the security requirements of the SG. Thus, in order to realize the aforementioned issues, this paper introduces a lightweight and secure authentication protocol, "LiSA", primarily to secure SMIs in SG setups. The protocol employs Elliptic Curve Cryptography at its core to provide various security features such as mutual authentication, anonymity, replay protection, session key security, and resistance against various attacks. Precisely, LiSA exploits the hardness of the Elliptic Curve Qu Vanstone (EVQV) certificate mechanism along with Elliptic Curve Diffie Hellman Problem (ECDHP) and Elliptic Curve Discrete Logarithm Problem (ECDLP). Additionally, LiSA is designed to provide the highest level of security relative to the existing schemes with least computational and communicational overheads. For instance, LiSA incurred barely 11.826 ms and 0.992 ms for executing different passes across the smart meter and the service providers. Further, it required a total of 544 bits for message transmission during each session.
NIJul 21, 2019
A Lightweight and Privacy-Preserving Authentication Protocol for Mobile Edge ComputingKuljeet Kaur, Sahil Garg, Georges Kaddoum et al.
With the advent of the Internet-of-Things (IoT), vehicular networks and cyber-physical systems, the need for real-time data processing and analysis has emerged as an essential pre-requite for customers' satisfaction. In this direction, Mobile Edge Computing (MEC) provides seamless services with reduced latency, enhanced mobility, and improved location awareness. Since MEC has evolved from Cloud Computing, it inherited numerous security and privacy issues from the latter. Further, decentralized architectures and diversified deployment environments used in MEC platforms also aggravate the problem; causing great concerns for the research fraternity. Thus, in this paper, we propose an efficient and lightweight mutual authentication protocol for MEC environments; based on Elliptic Curve Cryptography (ECC), one-way hash functions and concatenation operations. The designed protocol also leverages the advantages of discrete logarithm problems, computational Diffie-Hellman, random numbers and time-stamps to resist various attacks namely-impersonation attacks, replay attacks, man-in-the-middle attacks, etc. The paper also presents a comparative assessment of the proposed scheme relative to the current state-of-the-art schemes. The obtained results demonstrate that the proposed scheme incurs relatively less communication and computational overheads, and is appropriate to be adopted in resource constraint MEC environments.
NIApr 2, 2019
Blockchain-based Lightweight Authentication Mechanism for Vehicular Fog InfrastructureKuljeet Kaur, Sahil Garg, Georges Kaddoum et al.
With the increasing development of advanced communication technologies, vehicles are becoming smarter and more connected. Due to the tremendous growth of various vehicular applications, a huge amount of data is generated through advanced on-board devices and is deemed critical to improve driving safety and enhance vehicular services. However, cloud based models often fall short in applications where latency and mobility are critical. In order to fully realize the potential of vehicular networks, the challenges of efficient communication and computation need to be addressed. In this direction, vehicular fog computing (VFC) has emerged which extends the concept of fog computing to conventional vehicular networks. It is a geographically distributed paradigm that has the potential to conduct time-critical and data-intensive tasks by pushing intelligence (i.e. computing resources, storage, and application services) in the vicinity of end vehicles. However secure and reliable transmission are of significant importance in highly-mobile vehicular networks in order to ensure the optimal Quality of Service (QoS). In this direction, several authentication mechanisms have been proposed in the literature but most of them are found unfit due to absence of decentralization, anonymity, and trust characteristics. Thus, an effective cross-datacenter authentication and key-exchange scheme based on blockchain and elliptic curve cryptography (ECC) is proposed in this paper. Here, the distributed ledger of blockchain is used for maintaining the network information while the highly secure ECC is employed for mutual authentication between vehicles and road side units (RSUs). Additionally, the proposed scheme is lightweight and scalable for the considered VFC setup. The performance evaluation results against the existing state-of-the-art reveal that the proposed scheme accomplishes enhanced security features.
CRJan 30, 2019
Securing Fog-to-Things Environment Using Intrusion Detection System Based On Ensemble LearningPoulmanogo Illy, Georges Kaddoum, Christian Miranda Moreira et al.
The growing interest in the Internet of Things (IoT) applications is associated with an augmented volume of security threats. In this vein, the Intrusion detection systems (IDS) have emerged as a viable solution for the detection and prevention of malicious activities. Unlike the signature-based detection approaches, machine learning-based solutions are a promising means for detecting unknown attacks. However, the machine learning models need to be accurate enough to reduce the number of false alarms. More importantly, they need to be trained and evaluated on realistic datasets such that their efficacy can be validated on real-time deployments. Many solutions proposed in the literature are reported to have high accuracy but are ineffective in real applications due to the non-representativity of the dataset used for training and evaluation of the underlying models. On the other hand, some of the existing solutions overcome these challenges but yield low accuracy which hampers their implementation for commercial tools. These solutions are majorly based on single learners and are therefore directly affected by the intrinsic limitations of each learning algorithm. The novelty of this paper is to use the most realistic dataset available for intrusion detection called NSL-KDD, and combine multiple learners to build ensemble learners that increase the accuracy of the detection. Furthermore, a deployment architecture in a fog-to-things environment that employs two levels of classifications is proposed. In such architecture, the first level performs an anomaly detection which reduces the latency of the classification substantially, while the second level, executes attack classifications, enabling precise prevention measures. Finally, the experimental results demonstrate the effectiveness of the proposed IDS in comparison with the other state-of-the-arts on the NSL-KDD dataset.
MNApr 29, 2018
Building Models for Biopathway Dynamics Using Intrinsic Dimensionality AnalysisEmilia M. Wysocka, Valery Dzutsati, Tirthankar Bandyopadhyay et al.
An important task for many if not all the scientific domains is efficient knowledge integration, testing and codification. It is often solved with model construction in a controllable computational environment. In spite of that, the throughput of in-silico simulation-based observations become similarly intractable for thorough analysis. This is especially the case in molecular biology, which served as a subject for this study. In this project, we aimed to test some approaches developed to deal with the curse of dimensionality. Among these we found dimension reduction techniques especially appealing. They can be used to identify irrelevant variability and help to understand critical processes underlying high-dimensional datasets. Additionally, we subjected our data sets to nonlinear time series analysis, as those are well established methods for results comparison. To investigate the usefulness of dimension reduction methods, we decided to base our study on a concrete sample set. The example was taken from the domain of systems biology concerning dynamic evolution of sub-cellular signaling. Particularly, the dataset relates to the yeast pheromone pathway and is studied in-silico with a stochastic model. The model reconstructs signal propagation stimulated by a mating pheromone. In the paper, we elaborate on the reason of multidimensional analysis problem in the context of molecular signaling, and next, we introduce the model of choice, simulation details and obtained time series dynamics. A description of used methods followed by a discussion of results and their biological interpretation finalize the paper.
ROApr 27, 2018
Persistent Monitoring of Stochastic Spatio-temporal Phenomena with a Small Team of RobotsSahil Garg, Nora Ayanian
This paper presents a solution for persistent monitoring of real-world stochastic phenomena, where the underlying covariance structure changes sharply across time, using a small number of mobile robot sensors. We propose an adaptive solution for the problem where stochastic real-world dynamics are modeled as a Gaussian Process (GP). The belief on the underlying covariance structure is learned from recently observed dynamics as a Gaussian Mixture (GM) in the low-dimensional hyper-parameters space of the GP and adapted across time using Sequential Monte Carlo methods. Each robot samples a belief point from the GM and locally optimizes a set of informative regions by greedy maximization of the submodular entropy function. The key contributions of this paper are threefold: adapting the belief on the covariance using Markov Chain Monte Carlo (MCMC) sampling such that particles survive even under sharp covariance changes across time; exploiting the belief to transform the problem of entropy maximization into a decentralized one; and developing an approximation algorithm to maximize entropy on a set of informative regions in the continuous space. We illustrate the application of the proposed solution through extensive simulations using an artificial dataset and multiple real datasets from fixed sensor deployments, and compare it to three competing state-of-the-art approaches.
LGApr 27, 2018
Learning Non-Stationary Space-Time Models for Environmental MonitoringSahil Garg, Amarjeet Singh, Fabio Ramos
One of the primary aspects of sustainable development involves accurate understanding and modeling of environmental phenomena. Many of these phenomena exhibit variations in both space and time and it is imperative to develop a deeper understanding of techniques that can model space-time dynamics accurately. In this paper we propose NOSTILL-GP - NOn-stationary Space TIme variable Latent Length scale GP, a generic non-stationary, spatio-temporal Gaussian Process (GP) model. We present several strategies, for efficient training of our model, necessary for real-world applicability. Extensive empirical validation is performed using three real-world environmental monitoring datasets, with diverse dynamics across space and time. Results from the experiments clearly demonstrate general applicability and effectiveness of our approach for applications in environmental monitoring.
LGApr 27, 2018
Efficiently Learning Nonstationary Gaussian Processes for Real World ImpactSahil Garg
Most real world phenomena such as sunlight distribution under a forest canopy, minerals concentration, stock valuation, exhibit nonstationary dynamics i.e. phenomenon variation changes depending on the locality. Nonstationary dynamics pose both theoretical and practical challenges to statistical machine learning algorithms that aim to accurately capture the complexities governing the evolution of such processes. Typically the nonstationary dynamics are modeled using nonstationary Gaussian Process models (NGPS) that employ local latent dynamics parameterization to correspondingly model the nonstationary real observable dynamics. Recently, an approach based on most likely induced latent dynamics representation attracted research community's attention for a while. The approach could not be employed for large scale real world applications because learning a most likely latent dynamics representation involves maximization of marginal likelihood of the observed real dynamics that becomes intractable as the number of induced latent points grows with problem size. We have established a direct relationship between informativeness of the induced latent dynamics and the marginal likelihood of the observed real dynamics. This opens up the possibility of maximizing marginal likelihood of observed real dynamics indirectly by near optimally maximizing entropy or mutual information gain on the induced latent dynamics using greedy algorithms. Therefore, for an efficient yet accurate inference, we propose to build an induced latent dynamics representation using a novel algorithm LISAL that adaptively maximizes entropy or mutual information on the induced latent dynamics and marginal likelihood of observed real dynamics in an iterative manner. The relevance of LISAL is validated using real world datasets.
LGApr 26, 2018
Adaptive Sensing for Learning Nonstationary Environment ModelsSahil Garg, Amarjeet Singh, Fabio Ramos
Most environmental phenomena, such as wind profiles, ozone concentration and sunlight distribution under a forest canopy, exhibit nonstationary dynamics i.e. phenomenon variation change depending on the location and time of occurrence. Non-stationary dynamics pose both theoretical and practical challenges to statistical machine learning algorithms aiming to accurately capture the complexities governing the evolution of such processes. In this paper, we address the sampling aspects of the problem of learning nonstationary spatio-temporal models, and propose an efficient yet simple algorithm - LISAL. The core idea in LISAL is to learn two models using Gaussian processes (GPs) wherein the first is a nonstationary GP directly modeling the phenomenon. The second model uses a stationary GP representing a latent space corresponding to changes in dynamics, or the nonstationarity characteristics of the first model. LISAL involves adaptively sampling the latent space dynamics using information theory quantities to reduce the computational cost during the learning phase. The relevance of LISAL is extensively validated using multiple real world datasets.
LGApr 26, 2018
Modeling Psychotherapy Dialogues with Kernelized Hashcode Representations: A Nonparametric Information-Theoretic ApproachSahil Garg, Irina Rish, Guillermo Cecchi et al.
We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.
CLJan 11, 2018
Stochastic Learning of Nonstationary Kernels for Natural Language ModelingSahil Garg, Greg Ver Steeg, Aram Galstyan
Natural language processing often involves computations with semantic or syntactic graphs to facilitate sophisticated reasoning based on structural relationships. While convolution kernels provide a powerful tool for comparing graph structure based on node (word) level relationships, they are difficult to customize and can be computationally expensive. We propose a generalization of convolution kernels, with a nonstationary model, for better expressibility of natural languages in supervised settings. For a scalable learning of the parameters introduced with our model, we propose a novel algorithm that leverages stochastic sampling on k-nearest neighbor graphs, along with approximations based on locality-sensitive hashing. We demonstrate the advantages of our approach on a challenging real-world (structured inference) problem of automatically extracting biological models from the text of scientific papers.
CLNov 10, 2017
Kernelized Hashcode Representations for Relation ExtractionSahil Garg, Aram Galstyan, Greg Ver Steeg et al.
Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-of-the-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.
LGJan 22, 2017
Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing WorldSahil Garg, Irina Rish, Guillermo Cecchi et al.
In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements.
CLDec 4, 2015
Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical TextSahil Garg, Aram Galstyan, Ulf Hermjakob et al.
We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.