16.5SIMar 13
Integration of Deep Reinforcement Learning and Agent-based Simulation to Explore Strategies Counteracting Information DisorderLuigi Lomasto, Andrea Camoia, Alfonso Guarino et al.
In recent years, the spread of fake news has triggered a growing interest in Information Disorders (ID) on social media, a phenomenon that has become a focal point of research across fields ranging from complexity theory and computer science to cognitive sciences. Overall, such a body of research can be traced back to two main approaches. On the one hand, there are works focused on exploiting data mining to analyze the content of news and related metadata data-driven approach; on the other hand, works are aiming at making sense of the phenomenon at hand and their evolution using explicit simulation models model-driven approach). In this paper, we integrate these approaches to explore strategies for counteracting IDs. Heading in this direction, we put together: i. an Agent-Based model to simulate in a scientifically sound way both complex fake news dynamics and the effects produced by containment strategies therein; ii. Deep Reinforcement Learning to learn the strategies that can better mitigate the spread of misinformation. The outcomes of our work unfold on different levels. From a substantive point of view, the results of preliminary experiments started providing interesting cues about the conditions under which given policies can mitigate the spread of misinformation. From a technical and methodological point of view, we scratched the surface of promising and worthy research topics like the integration of social simulation and artificial intelligence and the enhancement of social science simulation environments.
12.8DSMar 18
The Inverse Lyndon Array: Definition, Properties, and Linear-Time ConstructionPietro Negri, Manuel Sica, Rocco Zaccagnino et al.
The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data structures such as the suffix array. In this paper, we introduce the Inverse Lyndon Array, the analogous structure for inverse Lyndon words, namely words that are lexicographically greater than all their proper suffixes. Unlike standard Lyndon words, inverse Lyndon words may have non-trivial borders, which introduces a genuine theoretical difficulty. We show that the inverse Lyndon array can be characterized in terms of the next greater suffix array together with a border-correction term, and prove that this correction coincides with a longest common extension (LCE) value. Building on this characterization, we adapt the nearest-suffix framework underlying Ellert's linear-time construction of the Lyndon array to the inverse setting, obtaining an O(n)-time algorithm for general ordered alphabets. Finally, we discuss implications for suffix comparison and report experiments on random, structured, and real datasets showing that the inverse construction exhibits the same practical linear-time behavior as the standard one.
GNFeb 28, 2022
Numeric Lyndon-based feature embedding of sequencing reads for machine learning approachesPaola Bonizzoni, Matteo Costantini, Clelia De Felice et al.
Feature embedding methods have been proposed in literature to represent sequences as numeric vectors to be used in some bioinformatics investigations, such as family classification and protein structure prediction. Recent theoretical results showed that the well-known Lyndon factorization preserves common factors in overlapping strings. Surprisingly, the fingerprint of a sequencing read, which is the sequence of lengths of consecutive factors in variants of the Lyndon factorization of the read, is effective in preserving sequence similarities, suggesting it as basis for the definition of novels representations of sequencing reads. We propose a novel feature embedding method for Next-Generation Sequencing (NGS) data using the notion of fingerprint. We provide a theoretical and experimental framework to estimate the behaviour of fingerprints and of the $k$-mers extracted from it, called $k$-fingers, as possible feature embeddings for sequencing reads. As a case study to assess the effectiveness of such embeddings, we use fingerprints to represent RNA-Seq reads and to assign them to the most likely gene from which they were originated as fragments of transcripts of the gene. We provide an implementation of the proposed method in the tool lyn2vec, which produces Lyndon-based feature embeddings of sequencing reads.