Suleyman S. Kozat

h-index28

46papers

499citations

Novelty56%

AI Score32

Ranked #139,882 of 201,326 authors (top 69%)#30,825 in LG (top 72%)

46 Papers

LGSep 1, 2022

Actor Prioritized Experience Replay

Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek et al.

A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms actor-critic algorithms in continuous control. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches and obtains state-of-the-art results over the standard off-policy actor-critic algorithms.

NIOct 10, 2022

Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI

Baturay Saglam, Doga Gurgunoglu, Suleyman S. Kozat

We introduce a novel deep reinforcement learning (DRL) approach to jointly optimize transmit beamforming and reconfigurable intelligent surface (RIS) phase shifts in a multiuser multiple input single output (MU-MISO) system to maximize the sum downlink rate under the phase-dependent reflection amplitude model. Our approach addresses the challenge of imperfect channel state information (CSI) and hardware impairments by considering a practical RIS amplitude model. We compare the performance of our approach against a vanilla DRL agent in two scenarios: perfect CSI and phase-dependent RIS amplitudes, and mismatched CSI and ideal RIS reflections. The results demonstrate that the proposed framework significantly outperforms the vanilla DRL agent under mismatch and approaches the golden standard. Our contributions include modifications to the DRL approach to address the joint design of transmit beamforming and phase shifts and the phase-dependent amplitude model. To the best of our knowledge, our method is the first DRL-based approach for the phase-dependent reflection amplitude model in RIS-aided MU-MISO systems. Our findings in this study highlight the potential of our approach as a promising solution to overcome hardware impairments in RIS-aided wireless communication systems.

SYMar 19, 2012

Linear MMSE-Optimal Turbo Equalization Using Context Trees

Nargiz Kalantarova, Kyeongyeon Kim, Suleyman S. Kozat et al.

Formulations of the turbo equalization approach to iterative equalization and decoding vary greatly when channel knowledge is either partially or completely unknown. Maximum aposteriori probability (MAP) and minimum mean square error (MMSE) approaches leverage channel knowledge to make explicit use of soft information (priors over the transmitted data bits) in a manner that is distinctly nonlinear, appearing either in a trellis formulation (MAP) or inside an inverted matrix (MMSE). To date, nearly all adaptive turbo equalization methods either estimate the channel or use a direct adaptation equalizer in which estimates of the transmitted data are formed from an expressly linear function of the received data and soft information, with this latter formulation being most common. We study a class of direct adaptation turbo equalizers that are both adaptive and nonlinear functions of the soft information from the decoder. We introduce piecewise linear models based on context trees that can adaptively approximate the nonlinear dependence of the equalizer on the soft information such that it can choose both the partition regions as well as the locally linear equalizer coefficients in each region independently, with computational complexity that remains of the order of a traditional direct adaptive linear equalizer. This approach is guaranteed to asymptotically achieve the performance of the best piecewise linear equalizer and we quantify the MSE performance of the resulting algorithm and the convergence of its MSE to that of the linear minimum MSE estimator as the depth of the context tree and the data length increase.

PMMar 19, 2012

Optimal Investment Under Transaction Costs: A Threshold Rebalanced Portfolio Approach

Sait Tunc, Suleyman S. Kozat

We study optimal investment in a financial market having a finite number of assets from a signal processing perspective. We investigate how an investor should distribute capital over these assets and when he should reallocate the distribution of the funds over these assets to maximize the cumulative wealth over any investment period. In particular, we introduce a portfolio selection algorithm that maximizes the expected cumulative wealth in i.i.d. two-asset discrete-time markets where the market levies proportional transaction costs in buying and selling stocks. We achieve this using "threshold rebalanced portfolios", where trading occurs only if the portfolio breaches certain thresholds. Under the assumption that the relative price sequences have log-normal distribution from the Black-Scholes model, we evaluate the expected wealth under proportional transaction costs and find the threshold rebalanced portfolio that achieves the maximal expected cumulative wealth over any investment period. Our derivations can be readily extended to markets having more than two stocks, where these extensions are pointed out in the paper. As predicted from our derivations, we significantly improve the achieved wealth over portfolio selection algorithms from the literature on historical data sets.

LGOct 1, 2022

Deep Intrinsically Motivated Exploration in Continuous Control

Baturay Saglam, Suleyman S. Kozat

In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrinsically motivated exploration is an effective alternative to the undirected strategies, but they are usually studied for discrete action domains. In this paper, we investigate how intrinsic motivation can effectively be combined with deep reinforcement learning in the control of continuous systems to obtain a directed exploratory behavior. We adapt the existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel and scalable directed exploration strategy. The introduced approach, motivated by the maximization of the value function's error, can benefit from a collected set of experiences by extracting useful information and unify the intrinsic exploration motivations in the literature under a single exploration objective. An extensive set of empirical studies demonstrate that our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.

LGAug 1, 2022

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu et al.

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy between the underlying distributions of the agent's policy and collected data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences, which eventually increases the computational complexity. Moreover, their generalization to either continuous action domains or policies approximated by deterministic deep neural networks is strictly limited. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy in continuous control. Our method offers an adequate single-step off-policy correction that is applicable to deterministic policy networks. Theoretical and empirical studies demonstrate that it can achieve a "safe" off-policy learning and substantially improve the state-of-the-art by attaining higher returns in fewer steps than the competing methods through an effective schedule of the learning rate in Q-learning and policy optimization.

MLMar 25, 2022

A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization

Mustafa E. Aydın, Suleyman S. Kozat

We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates, via an end-to-end architecture, the need for hand-designed features and manual model selection issues of conventional nonlinear prediction/regression methods. In particular, we use recursive structures to extract features from sequential signals, while preserving the state information, i.e., the history, and boosted decision trees to produce the final output. The connection is in an end-to-end fashion and we jointly optimize the whole architecture using stochastic gradient descent, for which we also provide the backward pass update equations. In particular, we employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. Our framework is generic so that one can use other deep learning architectures for feature extraction (such as RNNs and GRUs) and machine learning algorithms for decision making as long as they are differentiable. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets. Furthermore, we openly share the source code of the proposed method to facilitate further research.

SYMar 19, 2012

Low Complexity Turbo-Equalization: A Clustering Approach

Kyeongyeon Kim, Jun Won Choi, Suleyman S. Kozat et al.

We introduce a low complexity approach to iterative equalization and decoding, or "turbo equalization", that uses clustered models to better match the nonlinear relationship that exists between likelihood information from a channel decoder and the symbol estimates that arise in soft-input channel equalization. The introduced clustered turbo equalizer uses piecewise linear models to capture the nonlinear dependency of the linear minimum mean square error (MMSE) symbol estimate on the symbol likelihoods produced by the channel decoder and maintains a computational complexity that is only linear in the channel memory. By partitioning the space of likelihood information from the decoder, based on either hard or soft clustering, and using locally-linear adaptive equalizers within each clustered region, the performance gap between the linear MMSE equalizer and low-complexity, LMS-based linear turbo equalizers can be dramatically narrowed.

LGJul 27, 2022

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu et al.

Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.

DSJan 19, 2017

Efficient Implementation Of Newton-Raphson Methods For Sequential Data Prediction

Burak C. Civek, Suleyman S. Kozat

We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data predictor much faster compared to the first order algorithms, e.g., Online Gradient Descent. However, implementation of these methods is not usually feasible in big data applications because of the extremely high computational needs. Regular implementation of the Newton-Raphson Methods requires a computational complexity in the order of $O(M^2)$ for an $M$ dimensional feature vector, while the first order algorithms need only $O(M)$. To this end, in order to eliminate this gap, we introduce a highly efficient implementation reducing the computational complexity of the Newton-Raphson Methods from quadratic to linear scale. The presented algorithm provides the well-known merits of the second order methods while offering the computational complexity of $O(M)$. We utilize the shifted nature of the consecutive feature vectors and do not rely on any statistical assumptions. Therefore, both regular and fast implementations achieve the same performance in the sense of mean square error. We demonstrate the computational efficiency of our algorithm on real life sequential big datasets. We also illustrate that the presented algorithm is numerically stable.

LGOct 26, 2023

Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting

Aysin Tumay, Mustafa E. Aydin, Ali T. Koc et al.

We introduce a novel ensemble approach for feature selection based on hierarchical stacking for non-stationarity and/or a limited number of samples with a large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the output of the model is updated using other algorithms in a hierarchical manner with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and well-known real-life datasets, providing significant scalable and stable performance improvements compared to the traditional methods and the state-of-the-art approaches. We also provide the source code of our approach to facilitate further research and replicability of our results.

LGAug 7, 2022

Optimal Tracking in Prediction with Expert Advice

Hakan Gokcesu, Suleyman S. Kozat

We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts, e.g., independently running algorithms. We achieve the min-max optimal dynamic regret under the prediction with expert advice setting, i.e., we can compete against time-varying (not necessarily fixed) combinations of expert decisions in an optimal manner. Our end-algorithm is truly online with no prior information, such as the time horizon or loss ranges, which are commonly used by different algorithms in the literature. Both our regret guarantees and the min-max lower bounds are derived with the general consideration that the expert losses can have time-varying properties and are possibly unbounded. Our algorithm can be adapted for restrictive scenarios regarding both loss feedback and decision making. Our guarantees are universal, i.e., our end-algorithm can provide regret guarantee against any competitor sequence in a min-max optimal manner with logarithmic complexity. Note that, to our knowledge, for the prediction with expert advice problem, our algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge.

SYMar 19, 2012

A Novel Robust Approach to Least Squares Problems with Bounded Data Uncertainties

Nargiz Kalantarova, Mehmet A. Donmez, Suleyman S. Kozat

In this correspondence, we introduce a minimax regret criteria to the least squares problems with bounded data uncertainties and solve it using semi-definite programming. We investigate a robust minimax least squares approach that minimizes a worst case difference regret. The regret is defined as the difference between a squared data error and the smallest attainable squared data error of a least squares estimator. We then propose a robust regularized least squares approach to the regularized least squares problem under data uncertainties by using a similar framework. We show that both unstructured and structured robust least squares problems and robust regularized least squares problem can be put in certain semi-definite programming forms. Through several simulations, we demonstrate the merits of the proposed algorithms with respect to the the well-known alternatives in the literature.

MLSep 19, 2023

Hybrid State Space-based Learning for Sequential Data Prediction with Joint Optimization

Mustafa E. Aydın, Arda Fazla, Suleyman S. Kozat

We investigate nonlinear prediction/regression in an online setting and introduce a hybrid model that effectively mitigates, via a joint mechanism through a state space formulation, the need for domain-specific feature engineering issues of conventional nonlinear prediction models and achieves an efficient mix of nonlinear and linear components. In particular, we use recursive structures to extract features from raw sequential sequences and a traditional linear time series model to deal with the intricacies of the sequential data, e.g., seasonality, trends. The state-of-the-art ensemble or hybrid models typically train the base models in a disjoint manner, which is not only time consuming but also sub-optimal due to the separation of modeling or independent training. In contrast, as the first time in the literature, we jointly optimize an enhanced recurrent neural network (LSTM) for automatic feature extraction from raw data and an ARMA-family time series model (SARIMAX) for effectively addressing peculiarities associated with time series data. We achieve this by introducing novel state space representations for the base models, which are then combined to provide a full state space representation of the hybrid or the ensemble. Hence, we are able to jointly optimize both models in a single pass via particle filtering, for which we also provide the update equations. The introduced architecture is generic so that one can use other recurrent architectures, e.g., GRUs, traditional time series-specific models, e.g., ETS or other optimization methods, e.g., EKF, UKF. Due to such novel combination and joint optimization, we demonstrate significant improvements in widely publicized real life competition datasets. We also openly share our code for further research and replicability of our results.

PMJul 17, 2012

Optimal Investment Under Transaction Costs

Sait Tunc, Mehmet A. Donmez, Suleyman S. Kozat

We investigate how and when to diversify capital over assets, i.e., the portfolio selection problem, from a signal processing perspective. To this end, we first construct portfolios that achieve the optimal expected growth in i.i.d. discrete-time two-asset markets under proportional transaction costs. We then extend our analysis to cover markets having more than two stocks. The market is modeled by a sequence of price relative vectors with arbitrary discrete distributions, which can also be used to approximate a wide class of continuous distributions. To achieve the optimal growth, we use threshold portfolios, where we introduce a recursive update to calculate the expected wealth. We then demonstrate that under the threshold rebalancing framework, the achievable set of portfolios elegantly form an irreducible Markov chain under mild technical conditions. We evaluate the corresponding stationary distribution of this Markov chain, which provides a natural and efficient method to calculate the cumulative expected wealth. Subsequently, the corresponding parameters are optimized yielding the growth optimal portfolio under proportional transaction costs in i.i.d. discrete-time two-asset markets. As a widely known financial problem, we next solve optimal portfolio selection in discrete-time markets constructed by sampling continuous-time Brownian markets. For the case that the underlying discrete distributions of the price relative vectors are unknown, we provide a maximum likelihood estimator that is also incorporated in the optimization framework in our simulations.

SYMar 19, 2012

A New Analysis of an Adaptive Convex Mixture: A Deterministic Approach

Mehmet A. Donmez, Sait Tunc, Suleyman S. Kozat

We introduce a new analysis of an adaptive mixture method that combines outputs of two constituent filters running in parallel to model an unknown desired signal. This adaptive mixture is shown to achieve the mean square error (MSE) performance of the best constituent filter, and in some cases outperforms both, in the steady-state. However, the MSE analysis of this mixture in the steady-state and during the transient regions uses approximations and relies on statistical models on the underlying signals and systems. Hence, such an analysis may not be useful or valid for signals generated by various real life systems that show high degrees of nonstationarity, limit cycles and, in many cases, that are even chaotic. To this end, we perform the transient and the steady-state analysis of this adaptive mixture in a "strong" deterministic sense without any approximations in the derivations or statistical assumptions on the underlying signals such that our results are guaranteed to hold. In particular, we relate the time-accumulated squared estimation error of this adaptive mixture at any time to the time-accumulated squared estimation error of the optimal convex mixture of the constituent filters directly tuned to the underlying signal in an individual sequence manner.

LGJan 23, 2024Code

Binary Feature Mask Optimization for Feature Selection

Mehmet E. Lorasdagi, Mehmet Y. Turali, Suleyman S. Kozat

We investigate feature selection problem for generic machine learning models. We introduce a novel framework that selects features considering the outcomes of the model. Our framework introduces a novel feature masking approach to eliminate the features during the selection process, instead of completely removing them from the dataset. This allows us to use the same machine learning model during feature selection, unlike other feature selection methods where we need to train the machine learning model again as the dataset has different dimensions on each iteration. We obtain the mask operator using the predictions of the machine learning model, which offers a comprehensive view on the subsets of the features essential for the predictive performance of the model. A variety of approaches exist in the feature selection literature. However, to our knowledge, no study has introduced a training-free framework for a generic machine learning model to select features while considering the importance of the feature subsets as a whole, instead of focusing on the individual features. We demonstrate significant performance improvements on the real-life datasets under different settings using LightGBM and Multi-Layer Perceptron as our machine learning models. The high performance of our General Binary Mask Optimization algorithm stems from its feature masking approach to select features and its flexibility in the number of selected features. The algorithm selects features based on the validation performance of the machine learning model. Hence, the number of selected features is not predetermined and adjusts dynamically to the dataset. Additionally, we openly share the implementation or our code to encourage further research in this area.

LGJan 20, 2024Code

AFS-BM: Enhancing Model Performance through Adaptive Feature Selection with Binary Masking

Mehmet Y. Turali, Mehmet E. Lorasdagi, Ali T. Koc et al.

We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the "Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.

LGSep 5, 2020Code

PySAD: A Streaming Anomaly Detection Framework in Python

Selim F. Yilmaz, Suleyman S. Kozat

Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad.

LGJun 13, 2024

CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms

Arda Sarp Yenicesu, Furkan B. Mutlu, Suleyman S. Kozat et al.

The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative significance. The process of reassigning sample probabilities for every transition in the replay buffer after each iteration is considered extremely inefficient. Hence, in order to enhance computing efficiency, experience replay prioritization algorithms reassess the importance of a transition as it is sampled. However, the relative importance of the transitions undergoes dynamic adjustments when the agent's policy and value function are iteratively updated. Furthermore, experience replay is a mechanism that retains the transitions generated by the agent's past policies, which could potentially diverge significantly from the agent's most recent policy. An increased deviation from the agent's most recent policy results in a greater frequency of off-policy updates, which has a negative impact on the agent's performance. In this paper, we develop a novel algorithm, Corrected Uniform Experience Replay (CUER), which stochastically samples the stored experience while considering the fairness among all other experiences without ignoring the dynamic nature of the transition importance by making sampled state distribution more on-policy. CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.

LGNov 12, 2021

AWD3: Dynamic Reduction of the Estimation Bias

Dogan C. Cicek, Enes Duran, Baturay Saglam et al.

Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.

LGNov 2, 2021

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

Dogan C. Cicek, Enes Duran, Baturay Saglam et al.

The experience replay mechanism allows agents to use the experiences multiple times. In prior works, the sampling probability of the transitions was adjusted according to their importance. Reassigning sampling probabilities for every transition in the replay buffer after each iteration is highly inefficient. Therefore, experience replay prioritization algorithms recalculate the significance of a transition when the corresponding transition is sampled to gain computational efficiency. However, the importance level of the transitions changes dynamically as the policy and the value function of the agent are updated. In addition, experience replay stores the transitions are generated by the previous policies of the agent that may significantly deviate from the most recent policy of the agent. Higher deviation from the most recent policy of the agent leads to more off-policy updates, which is detrimental for the agent. In this paper, we develop a novel algorithm, Batch Prioritizing Experience Replay via KL Divergence (KLPER), which prioritizes batch of transitions rather than directly prioritizing each transition. Moreover, to reduce the off-policyness of the updates, our algorithm selects one batch among a certain number of batches and forces the agent to learn through the batch that is most likely generated by the most recent policy of the agent. We combine our algorithm with Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient and evaluate it on various continuous control tasks. KLPER provides promising improvements for deep deterministic continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.

LGSep 22, 2021

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

Baturay Saglam, Enes Duran, Dogan C. Cicek et al.

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.

LGAug 26, 2020

Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance

Selim F. Yilmaz, E. Batuhan Kaynak, Aykut Koç et al.

We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in 7 of 9 metrics in 3 different languages using a single model compared to the common baselines and the best-performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.

MLJun 25, 2020

Spatio-temporal Sequence Prediction with Point Processes and Self-organizing Decision Trees

Oguzhan Karaahmetoglu, Suleyman S. Kozat

We study the spatio-temporal prediction problem and introduce a novel point-process-based prediction algorithm. Spatio-temporal prediction is extensively studied in Machine Learning literature due to its critical real-life applications such as crime, earthquake, and social event prediction. Despite these thorough studies, specific problems inherent to the application domain are not yet fully explored. Here, we address the non-stationary spatio-temporal prediction problem on both densely and sparsely distributed sequences. We introduce a probabilistic approach that partitions the spatial domain into subregions and models the event arrivals in each region with interacting point-processes. Our algorithm can jointly learn the spatial partitioning and the interaction between these regions through a gradient-based optimization procedure. Finally, we demonstrate the performance of our algorithm on both simulated data and two real-life datasets. We compare our approach with baseline and state-of-the-art deep learning-based approaches, where we achieve significant performance improvements. Moreover, we also show the effect of using different parameters on the overall performance through empirical results and explain the procedure for choosing the parameters.

LGMay 22, 2020

A Tree Architecture of LSTM Networks for Sequential Regression with Missing Data

S. Onur Sahin, Suleyman S. Kozat

We investigate regression for variable length sequential data containing missing samples and introduce a novel tree architecture based on the Long Short-Term Memory (LSTM) networks. In our architecture, we employ a variable number of LSTM networks, which use only the existing inputs in the sequence, in a tree-like architecture without any statistical assumptions or imputations on the missing data, unlike all the previous approaches. In particular, we incorporate the missingness information by selecting a subset of these LSTM networks based on "presence-pattern" of a certain number of previous inputs. From the mixture of experts perspective, we train different LSTM networks as our experts for various missingness patterns and then combine their outputs to generate the final prediction. We also provide the computational complexity analysis of the proposed architecture, which is in the same order of the complexity of the conventional LSTM architectures for the sequence length. Our method can be readily extended to similar structures such as GRUs, RNNs as remarked in the paper. In the experiments, we achieve significant performance improvements with respect to the state-of-the-art methods for the well-known financial and real life datasets.

LGMay 16, 2020

Achieving Online Regression Performance of LSTMs with Simple RNNs

N. Mert Vural, Fatih Ilhan, Selim F. Yilmaz et al.

Recurrent Neural Networks (RNNs) are widely used for online regression due to their ability to generalize nonlinear temporal dependencies. As an RNN model, Long-Short-Term-Memory Networks (LSTMs) are commonly preferred in practice, as these networks are capable of learning long-term dependencies while avoiding the vanishing gradient problem. However, due to their large number of parameters, training LSTMs requires considerably longer training time compared to simple RNNs (SRNNs). In this paper, we achieve the online regression performance of LSTMs with SRNNs efficiently. To this end, we introduce a first-order training algorithm with a linear time complexity in the number of parameters. We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time. We provide strong theoretical analysis to support our experimental results by providing regret bounds on the convergence rate of our algorithm. Through an extensive set of experiments, we verify our theoretical work and demonstrate significant performance improvements of our algorithm with respect to LSTMs and the other state-of-the-art learning models.

LGMay 12, 2020

Unsupervised Anomaly Detection via Deep Metric Learning with End-to-End Optimization

Selim F. Yilmaz, Suleyman S. Kozat

We investigate unsupervised anomaly detection for high-dimensional data and introduce a deep metric learning (DML) based framework. In particular, we learn a distance metric through a deep neural network. Through this metric, we project the data into the metric space that better separates the anomalies from the normal data and reduces the effect of the curse of dimensionality for high-dimensional data. We present a novel data distillation method through self-supervision to remedy the conventional practice of assuming all data as normal. We also employ the hard mining technique from the DML literature. We show these components improve the performance of our model and significantly reduce the running time. Through an extensive set of experiments on the 14 real-world datasets, our method demonstrates significant performance gains compared to the state-of-the-art unsupervised anomaly detection methods, e.g., an absolute improvement between 4.44% and 11.74% on the average over the 14 datasets. Furthermore, we share the source code of our method on Github to facilitate further research.

LGMar 7, 2020

RNN-based Online Learning: An Efficient First-Order Optimization Algorithm with a Convergence Guarantee

N. Mert Vural, Selim F. Yilmaz, Fatih Ilhan et al.

We investigate online nonlinear regression with continually running recurrent neural network networks (RNNs), i.e., RNN-based online learning. For RNN-based online learning, we introduce an efficient first-order training algorithm that theoretically guarantees to converge to the optimum network parameters. Our algorithm is truly online such that it does not make any assumption on the learning environment to guarantee convergence. Through numerical simulations, we verify our theoretical results and illustrate significant performance improvements achieved by our algorithm with respect to the state-of-the-art RNN training methods.

LGNov 25, 2019

Stability of the Decoupled Extended Kalman Filter Learning Algorithm in LSTM-Based Online Learning

Nuri Mert Vural, Fatih Ilhan, Suleyman S. Kozat

We investigate the convergence and stability properties of the decoupled extended Kalman filter learning algorithm (DEKF) within the long-short term memory network (LSTM) based online learning framework. For this purpose, we model DEKF as a perturbed extended Kalman filter and derive sufficient conditions for its stability during LSTM training. We show that if the perturbations -- introduced due to decoupling -- stay bounded, DEKF learns LSTM parameters with similar convergence and stability properties of the global extended Kalman filter learning algorithm. We verify our results with several numerical simulations and compare DEKF with other LSTM training methods. In our simulations, we also observe that the well-known hyper-parameter selection approaches used for DEKF in the literature satisfy our conditions.

LGNov 25, 2019

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

N. Mert Vural, Hakan Gokcesu, Kaan Gokcesu et al.

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by $O(\sqrt{m})$. Our results are guaranteed to hold in an individual sequence manner since we have no statistical assumption on the bandit arm gains. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art algorithms.

LGOct 22, 2019

An Efficient and Effective Second-Order Training Algorithm for LSTM-based Adaptive Learning

N. Mert Vural, Salih Ergüt, Suleyman S. Kozat

We study adaptive (or online) nonlinear regression with Long-Short-Term-Memory (LSTM) based networks, i.e., LSTM-based adaptive learning. In this context, we introduce an efficient Extended Kalman filter (EKF) based second-order training algorithm. Our algorithm is truly online, i.e., it does not assume any underlying data generating process and future information, except that the target sequence is bounded. Through an extensive set of experiments, we demonstrate significant performance gains achieved by our algorithm with respect to the state-of-the-art methods. Here, we mainly show that our algorithm consistently provides 10 to 45\% improvement in the accuracy compared to the widely-used adaptive methods Adam, RMSprop, and DEKF, and comparable performance to EKF with a 10 to 15 times reduction in the run-time.

OCJun 30, 2019

Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret

Hakan Gokcesu, Suleyman S. Kozat

We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of squared subgradient norms) for a sequence of general convex functions, which may not have strong convexity, smoothness, exp-concavity or even Lipschitz-continuity. The regret guarantee is against any comparator decision sequence with bounded path variation (i.e. sum of the distances between successive decisions). We generate the lower bound of the worst-case second-order dynamic regret by incorporating actual subgradient norms. We show that this lower bound matches with our regret guarantee within a constant factor, which makes our algorithm minimax optimal. We also derive the extension for learning in each decision coordinate individually. We demonstrate how to best preserve our regret guarantee in a truly online manner, when the bound on path variation of the comparator sequence grows in time or the feedback regarding such bound arrives partially as time goes on. We further build on our algorithm to eliminate the need of any knowledge on the comparator path variation, and provide minimax optimal second-order regret guarantees with no a priori information. Our approach can compete against all comparator sequences simultaneously (universally) in a minimax optimal manner, i.e. each regret guarantee depends on the respective comparator path variation. We discuss modifications to our approach which address complexity reductions for time, computation and memory. We further improve our results by making the regret guarantees also dependent on comparator sets' diameters in addition to the respective path variations.

LGApr 19, 2019

Minimax Optimal Online Stochastic Learning for Sequences of Convex Functions under Sub-Gradient Observation Failures

Hakan Gokcesu, Suleyman S. Kozat

We study online convex optimization under stochastic sub-gradient observation faults, where we introduce adaptive algorithms with minimax optimal regret guarantees. We specifically study scenarios where our sub-gradient observations can be noisy or even completely missing in a stochastic manner. To this end, we propose algorithms based on sub-gradient descent method, which achieve tight minimax optimal regret bounds. When necessary, these algorithms utilize properties of the underlying stochastic settings to optimize their learning rates (step sizes). These optimizations are the main factor in providing the minimax optimal performance guarantees, especially when observations are stochastically missing. However, in real world scenarios, these properties of the underlying stochastic settings may not be revealed to the optimizer. For such a scenario, we propose a blind algorithm that estimates these properties empirically in a generally applicable manner. Through extensive experiments, we show that this empirical approach is a natural combination of regular stochastic gradient descent and the minimax optimal algorithms (which work best for randomized and adversarial function sequences, respectively).

LGJan 18, 2017

Highly Efficient Hierarchical Online Nonlinear Regression Using Second Order Methods

Burak C. Civek, Ibrahim Delibalta, Suleyman S. Kozat

We introduce highly efficient online nonlinear regression algorithms that are suitable for real life applications. We process the data in a truly online manner such that no storage is needed, i.e., the data is discarded after being used. For nonlinear modeling we use a hierarchical piecewise linear approach based on the notion of decision trees where the space of the regressor vectors is adaptively partitioned based on the performance. As the first time in the literature, we learn both the piecewise linear partitioning of the regressor space as well as the linear models in each region using highly effective second order methods, i.e., Newton-Raphson Methods. Hence, we avoid the well known over fitting issues by using piecewise linear models, however, since both the region boundaries as well as the linear models in each region are trained using the second order methods, we achieve substantial performance compared to the state of the art. We demonstrate our gains over the well known benchmark data sets and provide performance results in an individual sequence manner guaranteed to hold without any statistical assumptions. Hence, the introduced algorithms address computational complexity issues widely encountered in real life applications while providing superior guaranteed performance in a strong deterministic sense.

LGDec 5, 2016

An Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

Mohammadreza Mohaghegh Neyshabouri, Kaan Gokcesu, Huseyin Ozkan et al.

We propose online algorithms for sequential learning in the contextual multi-armed bandit setting. Our approach is to partition the context space and then optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithms based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design efficient implementations for our algorithms in various hierarchical partitioning structures such as lexicographical or arbitrary position splitting and binary trees (and several other partitioning examples). For instance, in the case of binary tree partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (w.r.t. the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state-of-the-art. Our experimental work extensively covers various scenarios ranging from bandit settings to multi-class classification with real and synthetic data. In these experiments, we show that our algorithms are highly superior over the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability.

SYOct 3, 2016

Team-Optimal Distributed MMSE Estimation in General and Tree Networks

Muhammed O. Sayin, Suleyman S. Kozat, Tamer Başar

We construct team-optimal estimation algorithms over distributed networks for state estimation in the finite-horizon mean-square error (MSE) sense. Here, we have a distributed collection of agents with processing and cooperation capabilities. These agents observe noisy samples of a desired state through a linear model and seek to learn this state by interacting with each other. Although this problem has attracted significant attention and been studied extensively in fields including machine learning and signal processing, all the well-known strategies do not achieve team-optimal learning performance in the finite-horizon MSE sense. To this end, we formulate the finite-horizon distributed minimum MSE (MMSE) when there is no restriction on the size of the disclosed information, i.e., oracle performance, over an arbitrary network topology. Subsequently, we show that exchange of local estimates is sufficient to achieve the oracle performance only over certain network topologies. By inspecting these network structures, we propose recursive algorithms achieving the oracle performance through the disclosure of local estimates. For practical implementations we also provide approaches to reduce the complexity of the algorithms through the time-windowing of the observations. Finally, in the numerical examples, we demonstrate the superior performance of the introduced algorithms in the finite-horizon MSE sense due to optimal estimation.

NAAug 31, 2015

Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

N. Denizcan Vanli, Muhammed O. Sayin, Suleyman S. Kozat

We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on the stochastic gradient descent (SGD) updates. Particularly, we use a carefully designed time-dependent weighted averaging of the SGD iterates, which yields a convergence rate of $O\left(\frac{N\sqrt{N}}{T}\right)$ after $T$ gradient updates for each node on a network of $N$ nodes. We then show that after $T$ gradient oracle calls, the average SGD iterate achieves a mean square deviation (MSD) of $O\left(\frac{\sqrt{N}}{T}\right)$. This rate of convergence is optimal as it matches the performance lower bound up to constant terms. Similar to the SGD algorithm, the computational complexity of the proposed algorithm also scales linearly with the dimensionality of the data. Furthermore, the communication load of the proposed method is the same as the communication load of the SGD algorithm. Thus, the proposed algorithm is highly efficient in terms of complexity and communication load. We illustrate the merits of the algorithm with respect to the state-of-art methods over benchmark real life data sets and widely studied network topologies.

LGSep 30, 2014

Data Imputation through the Identification of Local Anomalies

Huseyin Ozkan, Ozgun S. Pelvan, Suleyman S. Kozat

We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions.

LGJan 23, 2014

Predicting Nearly As Well As the Optimal Twice Differentiable Regressor

N. Denizcan Vanli, Muhammed O. Sayin, Suleyman S. Kozat

We study nonlinear regression of real valued data in an individual sequence manner, where we provide results that are guaranteed to hold without any statistical assumptions. We address the convergence and undertraining issues of conventional nonlinear regression methods and introduce an algorithm that elegantly mitigates these issues via an incremental hierarchical structure, (i.e., via an incremental decision tree). Particularly, we present a piecewise linear (or nonlinear) regression algorithm that partitions the regressor space in a data driven manner and learns a linear model at each region. Unlike the conventional approaches, our algorithm gradually increases the number of disjoint partitions on the regressor space in a sequential manner according to the observed data. Through this data driven approach, our algorithm sequentially and asymptotically achieves the performance of the optimal twice differentiable regression function for any data sequence with an unknown and arbitrary length. The computational complexity of the introduced algorithm is only logarithmic in the data length under certain regularity conditions. We provide the explicit description of the algorithm and demonstrate the significant gains for the well-known benchmark real data sets and chaotic signals.

LGNov 26, 2013

A Novel Family of Adaptive Filtering Algorithms Based on The Logarithmic Cost

Muhammed O. Sayin, N. Denizcan Vanli, Suleyman S. Kozat

We introduce a novel family of adaptive filtering algorithms based on a relative logarithmic cost. The new family intrinsically combines the higher and lower order measures of the error into a single continuous update based on the error amount. We introduce important members of this family of algorithms such as the least mean logarithmic square (LMLS) and least logarithmic absolute difference (LLAD) algorithms that improve the convergence performance of the conventional algorithms. However, our approach and analysis are generic such that they cover other well-known cost functions as described in the paper. The LMLS algorithm achieves comparable convergence performance with the least mean fourth (LMF) algorithm and extends the stability bound on the step size. The LLAD and least mean square (LMS) algorithms demonstrate similar convergence performance in impulse-free noise environments while the LLAD algorithm is robust against impulsive interferences and outperforms the sign algorithm (SA). We analyze the transient, steady state and tracking performance of the introduced algorithms and demonstrate the match of the theoretical analyzes and simulation results. We show the extended stability bound of the LMLS algorithm and analyze the robustness of the LLAD algorithm against impulsive interferences. Finally, we demonstrate the performance of our algorithms in different scenarios through numerical examples.

LGNov 25, 2013

A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds

N. Denizcan Vanli, Suleyman S. Kozat

We study sequential prediction of real-valued, arbitrary and unknown sequences under the squared error loss as well as the best parametric predictor out of a large, continuous class of predictors. Inspired by recent results from computational learning theory, we refrain from any statistical assumptions and define the performance with respect to the class of general parametric predictors. In particular, we present generic lower and upper bounds on this relative performance by transforming the prediction task into a parameter learning problem. We first introduce the lower bounds on this relative performance in the mixture of experts framework, where we show that for any sequential algorithm, there always exists a sequence for which the performance of the sequential algorithm is lower bounded by zero. We then introduce a sequential learning algorithm to predict such arbitrary and unknown sequences, and calculate upper bounds on its total squared prediction error for every bounded sequence. We further show that in some scenarios we achieve matching lower and upper bounds demonstrating that our algorithms are optimal in a strong minimax sense such that their performances cannot be improved further. As an interesting result we also prove that for the worst case scenario, the performance of randomized algorithms can be achieved by sequential algorithms so that randomized algorithms does not improve the performance.

LGNov 25, 2013

A Comprehensive Approach to Universal Piecewise Nonlinear Regression Based on Trees

N. Denizcan Vanli, Suleyman S. Kozat

In this paper, we investigate adaptive nonlinear regression and introduce tree based piecewise linear regression algorithms that are highly efficient and provide significantly improved performance with guaranteed upper bounds in an individual sequence manner. We use a tree notion in order to partition the space of regressors in a nested structure. The introduced algorithms adapt not only their regression functions but also the complete tree structure while achieving the performance of the "best" linear mixture of a doubly exponential number of partitions, with a computational complexity only polynomial in the number of nodes of the tree. While constructing these algorithms, we also avoid using any artificial "weighting" of models (with highly data dependent parameters) and, instead, directly minimize the final regression error, which is the ultimate performance goal. The introduced methods are generic such that they can readily incorporate different tree construction methods such as random trees in their framework and can use different regressor or partitioning functions as demonstrated in the paper.

LGSep 28, 2012

A Deterministic Analysis of an Online Convex Mixture of Expert Algorithms

Mehmet A. Donmez, Sait Tunc, Suleyman S. Kozat

We analyze an online learning algorithm that adaptively combines outputs of two constituent algorithms (or the experts) running in parallel to model an unknown desired signal. This online learning algorithm is shown to achieve (and in some cases outperform) the mean-square error (MSE) performance of the best constituent algorithm in the mixture in the steady-state. However, the MSE analysis of this algorithm in the literature uses approximations and relies on statistical models on the underlying signals and systems. Hence, such an analysis may not be useful or valid for signals generated by various real life systems that show high degrees of nonstationarity, limit cycles and, in many cases, that are even chaotic. In this paper, we produce results in an individual sequence manner. In particular, we relate the time-accumulated squared estimation error of this online algorithm at any time over any interval to the time accumulated squared estimation error of the optimal convex mixture of the constituent algorithms directly tuned to the underlying signal in a deterministic sense without any statistical assumptions. In this sense, our analysis provides the transient, steady-state and tracking behavior of this algorithm in a strong sense without any approximations in the derivations or statistical assumptions on the underlying signals such that our results are guaranteed to hold. We illustrate the introduced results through examples.

LGMar 20, 2012

Adaptive Mixture Methods Based on Bregman Divergences

Mehmet A. Donmez, Huseyin A. Inan, Suleyman S. Kozat

We investigate adaptive mixture methods that linearly combine outputs of $m$ constituent filters running in parallel to model a desired signal. We use "Bregman divergences" and obtain certain multiplicative updates to train the linear combination weights under an affine constraint or without any constraints. We use unnormalized relative entropy and relative entropy to define two different Bregman divergences that produce an unnormalized exponentiated gradient update and a normalized exponentiated gradient update on the mixture weights, respectively. We then carry out the mean and the mean-square transient analysis of these adaptive algorithms when they are used to combine outputs of $m$ constituent filters. We illustrate the accuracy of our results and demonstrate the effectiveness of these updates for sparse mixture systems.

LGMar 20, 2012

A Novel Training Algorithm for HMMs with Partial and Noisy Access to the States

Huseyin Ozkan, Arda Akman, Suleyman S. Kozat

This paper proposes a new estimation algorithm for the parameters of an HMM as to best account for the observed data. In this model, in addition to the observation sequence, we have \emph{partial} and \emph{noisy} access to the hidden state sequence as side information. This access can be seen as "partial labeling" of the hidden states. Furthermore, we model possible mislabeling in the side information in a joint framework and derive the corresponding EM updates accordingly. In our simulations, we observe that using this side information, we considerably improve the state recognition performance, up to 70%, with respect to the "achievable margin" defined by the baseline algorithms. Moreover, our algorithm is shown to be robust to the training conditions.