CLNov 23, 2021
SimpleTRON: Simple Transformer with O(N) ComplexityUladzislau Yorsh, Alexander Kovalenko, Vojtěch Vančura et al.
In this paper, we propose that the dot product pairwise matching attention layer, which is widely used in Transformer-based models, is redundant for the model performance. Attention, in its original formulation, has to be seen rather as a human-level tool to explore and/or visualize relevancy scores in sequential data. However, the way how it is constructed leads to significant computational complexity. Instead, we present SimpleTRON: Simple Transformer with O(N) Complexity, a simple and fast alternative without any approximation that, unlike other approximation models, does not have any architecture-related overhead and therefore can be seen as a purely linear Transformer-like model. This architecture, to the best of our knowledge, outperforms existing sub-quadratic attention approximation models on several tasks from the Long-Range Arena benchmark. Moreover, we show, that SimpleTRON can benefit from weight transfer from pretrained large language models, as its parameters can be fully transferable.
AIAug 3, 2021
Classification of Discrete Dynamical Systems Based on TransientsBarbora Hudcová, Tomáš Mikolov
In order to develop systems capable of artificial evolution, we need to identify which systems can produce complex behavior. We present a novel classification method applicable to any class of deterministic discrete space and time dynamical systems. The method is based on classifying the asymptotic behavior of the average computation time in a given system before entering a loop. We were able to identify a critical region of behavior that corresponds to a phase transition from ordered behavior to chaos across various classes of dynamical systems. To show that our approach can be applied to many different computational systems, we demonstrate the results of classifying cellular automata, Turing machines, and random Boolean networks. Further, we use this method to classify 2D cellular automata to automatically find those with interesting, complex dynamics. We believe that our work can be used to design systems in which complex structures emerge. Also, it can be used to compare various versions of existing attempts to model open-ended evolution (Ray (1991), Ofria et al. (2004), Channon (2006)).
NEAug 1, 2021
Computational Hierarchy of Elementary Cellular AutomataBarbora Hudcová, Tomáš Mikolov
The complexity of cellular automata is traditionally measured by their computational capacity. However, it is difficult to choose a challenging set of computational tasks suitable for the parallel nature of such systems. We study the ability of automata to emulate one another, and we use this notion to define such a set of naturally emerging tasks. We present the results for elementary cellular automata, although the core ideas can be extended to other computational systems. We compute a graph showing which elementary cellular automata can be emulated by which and show that certain chaotic automata are the only ones that cannot emulate any automata non-trivially. Finally, we use the emulation notion to suggest a novel definition of chaos that we believe is suitable for discrete computational systems. We believe our work can help design parallel computational systems that are Turing-complete and also computationally efficient.