Dominik Wojtczak

h-index15

19papers

298citations

Novelty36%

AI Score27

Ranked #156,791 of 194,257 authors (top 81%)#34,362 in LG (top 86%)

19 Papers

5.3LGAug 14, 2023

Omega-Regular Reward Machines

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.

5.8LGJul 28, 2022

Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences

Yanhua Xu, Dominik Wojtczak

Influenza viruses mutate rapidly and can pose a threat to public health, especially to those in vulnerable groups. Throughout history, influenza A viruses have caused pandemics between different species. It is important to identify the origin of a virus in order to prevent the spread of an outbreak. Recently, there has been increasing interest in using machine learning algorithms to provide fast and accurate predictions for viral sequences. In this study, real testing data sets and a variety of evaluation metrics were used to evaluate machine learning algorithms at different taxonomic levels. As hemagglutinin is the major protein in the immune response, only hemagglutinin sequences were used and represented by position-specific scoring matrix and word embedding. The results suggest that the 5-grams-transformer neural network is the most effective algorithm for predicting viral sequence origins, with approximately 99.54% AUCPR, 98.01% F1 score and 96.60% MCC at a higher classification level, and approximately 94.74% AUCPR, 87.41% F1 score and 80.79% MCC at a lower classification level.

1.8LGJun 23, 2022

Recursive Reinforcement Learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

Recursion is the fundamental paradigm to finitely describe potentially infinite objects. As state-of-the-art reinforcement learning (RL) algorithms cannot directly reason about recursion, they must rely on the practitioner's ingenuity in designing a suitable "flat" representation of the environment. The resulting manual feature constructions and approximations are cumbersome and error-prone; their lack of transparency hampers scalability. To overcome these challenges, we develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes (MDPs) that can recursively invoke one another. Each constituent MDP is characterized by several entry and exit points that correspond to input and output values of these invocations. These recursive MDPs (or RMDPs) are expressively equivalent to probabilistic pushdown systems (with call-stack playing the role of the pushdown stack), and can model probabilistic programs with recursive procedural calls. We introduce Recursive Q-learning -- a model-free RL algorithm for RMDPs -- and prove that it converges for finite, single-exit and deterministic multi-exit RMDPs under mild assumptions.

1.2FLMay 6, 2022

Alternating Good-for-MDP Automata

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Büchi automata instead. These are nondeterministic Büchi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Büchi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDP automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDP property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDP automata we produce from deterministic Streett automata are bi-linear in the the size of the deterministic automaton and its index, and can therefore be exponentially more succinct than minimal nondeterministic Büchi automata.

2.0LGJun 8, 2023

MC-NN: An End-to-End Multi-Channel Neural Network Approach for Predicting Influenza A Virus Hosts and Antigenic Types

Yanhua Xu, Dominik Wojtczak

Influenza poses a significant threat to public health, particularly among the elderly, young children, and people with underlying dis-eases. The manifestation of severe conditions, such as pneumonia, highlights the importance of preventing the spread of influenza. An accurate and cost-effective prediction of the host and antigenic sub-types of influenza A viruses is essential to addressing this issue, particularly in resource-constrained regions. In this study, we propose a multi-channel neural network model to predict the host and antigenic subtypes of influenza A viruses from hemagglutinin and neuraminidase protein sequences. Our model was trained on a comprehensive data set of complete protein sequences and evaluated on various test data sets of complete and incomplete sequences. The results demonstrate the potential and practicality of using multi-channel neural networks in predicting the host and antigenic subtypes of influenza A viruses from both full and partial protein sequences.

1.2QMJun 8, 2022

Multi-channel neural networks for predicting influenza A virus hosts and antigenic types

Yanhua Xu, Dominik Wojtczak

Influenza occurs every season and occasionally causes pandemics. Despite its low mortality rate, influenza is a major public health concern, as it can be complicated by severe diseases like pneumonia. A fast, accurate and low-cost method to predict the origin host and subtype of influenza viruses could help reduce virus transmission and benefit resource-poor areas. In this work, we propose multi-channel neural networks to predict antigenic types and hosts of influenza A viruses with hemagglutinin and neuraminidase protein sequences. An integrated data set containing complete protein sequences were used to produce a pre-trained model, and two other data sets were used for testing the model's performance. One test set contained complete protein sequences, and another test set contained incomplete protein sequences. The results suggest that multi-channel neural networks are applicable and promising for predicting influenza A virus hosts and antigenic subtypes with complete and partial protein sequences.

1.2FLFeb 18, 2013

Optimal Scheduling for Linear-Rate Multi-Mode Systems

Dominik Wojtczak

Linear-Rate Multi-Mode Systems is a model that can be seen both as a subclass of switched linear systems with imposed global safety constraints and as hybrid automata with no guards on transitions. We study the existence and design of a controller for this model that keeps the state of the system within a given safe set for the whole time. A sufficient and necessary condition is given for such a controller to exist as well as an algorithm that finds one in polynomial time. We further generalise the model by adding costs on modes and present an algorithm that constructs a safe controller which minimises the peak cost, the average-cost or any cost expressed as a weighted sum of these two. Finally, we present numerical simulation results based on our implementation of these algorithms.

9.7CRSep 25, 2024

A Survey for Deep Reinforcement Learning Based Network Intrusion Detection

Wanrong Yang, Alberto Acuto, Yihang Zhou et al.

Cyber-attacks are becoming increasingly sophisticated and frequent, highlighting the importance of network intrusion detection systems. This paper explores the potential and challenges of using deep reinforcement learning (DRL) in network intrusion detection. It begins by introducing key DRL concepts and frameworks, such as deep Q-networks and actor-critic algorithms, and reviews recent research utilizing DRL for intrusion detection. The study evaluates challenges related to model training efficiency, detection of minority and unknown class attacks, feature selection, and handling unbalanced datasets. The performance of DRL models is comprehensively analyzed, showing that while DRL holds promise, many recent technologies remain underexplored. Some DRL models achieve state-of-the-art results on public datasets, occasionally outperforming traditional deep learning methods. The paper concludes with recommendations for enhancing DRL deployment and testing in real-world network scenarios, with a focus on Internet of Things intrusion detection. It discusses recent DRL architectures and suggests future policy functions for DRL-based intrusion detection. Finally, the paper proposes integrating DRL with generative methods to further improve performance, addressing current gaps and supporting more robust and adaptive network intrusion detection systems.

1.2LODec 14, 2023

Omega-Regular Decision Processes

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct representation of non-Markovian decision processes, their expressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead over the system evolution. Semantically, these lookaheads can be considered as promises made by the decision maker or the learning agent about her future behavior. In particular, we assume that, if the promised lookaheads are not met, then the payoff to the decision maker is $\bot$ (least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction.

0.3CLJan 4, 2022

Predicting Influenza A Viral Host Using PSSM and Word Embeddings

Yanhua Xu, Dominik Wojtczak

The rapid mutation of the influenza virus threatens public health. Reassortment among viruses with different hosts can lead to a fatal pandemic. However, it is difficult to detect the original host of the virus during or after an outbreak as influenza viruses can circulate between different species. Therefore, early and rapid detection of the viral host would help reduce the further spread of the virus. We use various machine learning models with features derived from the position-specific scoring matrix (PSSM) and features learned from word embedding and word encoding to infer the origin host of viruses. The results show that the performance of the PSSM-based model reaches the MCC around 95%, and the F1 around 96%. The MCC obtained using the model with word embedding is around 96%, and the F1 is around 97%.

9.2LGJun 16, 2021

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $ω$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $ω$-automata specified in HOA.

3.1LGJun 12, 2021

Model-free Reinforcement Learning for Branching Markov Decision Processes

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

2.3GTMar 23, 2021

Facility Reallocation on the Line

Bart de Keijzer, Dominik Wojtczak

We consider a multi-stage facility reallocation problems on the real line, where a facility is being moved between time stages based on the locations reported by $n$ agents. The aim of the reallocation algorithm is to minimise the social cost, i.e., the sum over the total distance between the facility and all agents at all stages, plus the cost incurred for moving the facility. We study this problem both in the offline setting and online setting. In the offline case the algorithm has full knowledge of the agent locations in all future stages, and in the online setting the algorithm does not know these future locations and must decide the location of the facility on a stage-per-stage basis. We derive the optimal algorithm in both cases. For the online setting we show that its competitive ratio is $(n+2)/(n+1)$. As neither of these algorithms turns out to yield a strategy-proof mechanism, we propose another strategy-proof mechanism which has a competitive ratio of $(n+3)/(n+1)$ for odd $n$ and $(n+4)/n$ for even $n$, which we conjecture to be the best possible. We also consider a generalisation with multiple facilities and weighted agents, for which we show that the optimum can be computed in polynomial time for a fixed number of facilities.

5.9LOJan 16, 2020

Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

E. M. Hahn, M. Perez, S. Schewe et al.

Recently, successful approaches have been made to exploit good-for-MDPs automata (Büchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these Büchi automata is that the Büchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction.

3.6AIJul 22, 2019

Open Problems in a Logic of Gossips

Krzysztof R. Apt, Dominik Wojtczak

Gossip protocols are programs used in a setting in which each agent holds a secret and the aim is to reach a situation in which all agents know all secrets. Such protocols rely on a point-to-point or group communication. Distributed epistemic gossip protocols use epistemic formulas in the component programs for the agents. The advantage of the use of epistemic logic is that the resulting protocols are very concise and amenable for a simple verification. Recently, we introduced a natural modal logic that allows one to express distributed epistemic gossip protocols and to reason about their correctness. We proved that the resulting protocols are implementable and that all aspects of their correctness, including termination, are decidable. To establish these results we showed that both the definition of semantics and of truth of the underlying logic are decidable. We also showed that the analogous results hold for an extension of this logic with the 'common knowledge' operator. However, several, often deceptively simple, questions about this logic and the corresponding gossip protocols remain open. The purpose of this paper is to list and elucidate these questions and provide for them an appropriate background information in the form of partial of related results.

26.4LOSep 26, 2018

Omega-Regular Objectives in Model-Free Reinforcement Learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe et al.

We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.

5.1DMFeb 26, 2018

On Strong NP-Completeness of Rational Problems

Dominik Wojtczak

The computational complexity of the partition, 0-1 subset sum, unbounded subset sum, 0-1 knapsack and unbounded knapsack problems and their multiple variants were studied in numerous papers in the past where all the weights and profits were assumed to be integers. We re-examine here the computational complexity of all these problems in the setting where the weights and profits are allowed to be any rational numbers. We show that all of these problems in this setting become strongly NP-complete and, as a result, no pseudo-polynomial algorithm can exist for solving them unless P=NP. Despite this result we show that they all still admit a fully polynomial-time approximation scheme.

3.3LOJul 27, 2017

Common Knowledge in a Logic of Gossips

Krzysztof R. Apt, Dominik Wojtczak

Gossip protocols aim at arriving, by means of point-to-point or group communications, at a situation in which all the agents know each other secrets. Recently a number of authors studied distributed epistemic gossip protocols. These protocols use as guards formulas from a simple epistemic logic, which makes their analysis and verification substantially easier. We study here common knowledge in the context of such a logic. First, we analyze when it can be reduced to iterated knowledge. Then we show that the semantics and truth for formulas without nested common knowledge operator are decidable. This implies that implementability, partial correctness and termination of distributed epistemic gossip protocols that use non-nested common knowledge operator is decidable, as well. Given that common knowledge is equivalent to an infinite conjunction of nested knowledge, these results are non-trivial generalizations of the corresponding decidability results for the original epistemic logic, established in (Apt & Wojtczak, 2016). K. R. Apt & D. Wojtczak (2016): On Decidability of a Logic of Gossips. In Proc. of JELIA 2016, pp. 18-33, doi:10.1007/ 978-3-319-48758-8_2.

1.2LOJun 29, 2017

Optimal Control for Multi-Mode Systems with Discrete Costs

Mahmoud A. A. Mousa, Sven Schewe, Dominik Wojtczak

This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an optimal control for this model can be computed in NEXPTIME and approximated in PSPACE. We also show that the one-dimensional case is simpler: although the problem is NP-complete (and in LOGSPACE for an infinite time horizon), we develop an FPTAS for finding an approximate solution.