Łukasz Lepak

LG
h-index2
4papers
1citation
Novelty53%
AI Score32

4 Papers

LGMar 28, 2023
On-line reinforcement learning for optimization of real-life energy trading strategy

Łukasz Lepak, Paweł Wawrzyński

An increasing share of energy is produced from renewable sources by many small producers. The efficiency of those sources is volatile and, to some extent, random, exacerbating the problem of energy market balancing. In many countries, this balancing is done on the day-ahead (DA) energy markets. This paper considers automated trading on the DA energy market by a medium-sized prosumer. We model this activity as a Markov Decision Process and formalize a framework in which an applicable in real-life strategy can be optimized with off-line data. We design a trading strategy that is fed with the available environmental information that can impact future prices, including weather forecasts. We use state-of-the-art reinforcement learning (RL) algorithms to optimize this strategy. For comparison, we also synthesize simple parametric trading strategies and optimize them with an evolutionary algorithm. Results show that our RL-based strategy generates the highest market profits.

LGAug 19, 2025
MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning

Maciej Wojtala, Bogusz Stefańczyk, Dominik Bogucki et al.

Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.

LGMay 28, 2021
Reinforcement Learning for on-line Sequence Transformation

Grzegorz Rypeść, Łukasz Lepak, Paweł Wawrzyński

A number of problems in the processing of sound and natural language, as well as in other areas, can be reduced to simultaneously reading an input sequence and writing an output sequence of generally different length. There are well developed methods that produce the output sequence based on the entirely known input. However, efficient methods that enable such transformations on-line do not exist. In this paper we introduce an architecture that learns with reinforcement to make decisions about whether to read a token or write another token. This architecture is able to transform potentially infinite sequences on-line. In an experimental study we compare it with state-of-the-art methods for neural machine translation. While it produces slightly worse translations than Transformer, it outperforms the autoencoder with attention, even though our architecture translates texts on-line thereby solving a more difficult problem than both reference methods.

NEMay 28, 2021
Least Redundant Gated Recurrent Neural Network

Łukasz Neumann, Łukasz Lepak, Paweł Wawrzyński

Recurrent neural networks are important tools for sequential data processing. However, they are notorious for problems regarding their training. Challenges include capturing complex relations between consecutive states and stability and efficiency of training. In this paper, we introduce a recurrent neural architecture called Deep Memory Update (DMU). It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. The architecture is able to learn to transform its internal state using any nonlinear function. Its training is stable and fast due to relating its learning rate to the size of the module. Even though DMU is based on standard components, experimental results presented here confirm that it can compete with and often outperform state-of-the-art architectures such as Long Short-Term Memory, Gated Recurrent Units, and Recurrent Highway Networks.