LGApr 28, 2022
Who will stay? Using Deep Learning to predict engagement of citizen scientistsAlexander Semenov, Yixin Zhang, Marisa Ponti
Citizen science and machine learning should be considered for monitoring the coastal and ocean environment due to the scale of threats posed by climate change and the limited resources to fill knowledge gaps. Using data from the annotation activity of citizen scientists in a Swedish marine project, we constructed Deep Neural Network models to predict forthcoming engagement. We tested the models to identify patterns in annotation engagement. Based on the results, it is possible to predict whether an annotator will remain active in future sessions. Depending on the goals of individual citizen science projects, it may also be necessary to identify either those volunteers who will leave or those who will continue annotating. This can be predicted by varying the threshold for the prediction. The engagement metrics used to construct the models are based on time and activity and can be used to infer latent characteristics of volunteers and predict their task interest based on their activity patterns. They can estimate if volunteers can accomplish a given number of tasks in a certain amount of time, identify early on who is likely to become a top contributor or identify who is likely to quit and provide them with targeted interventions. The novelty of our predictive models lies in the use of Deep Neural Networks and the sequence of volunteer annotations. A limitation of our models is that they do not use embeddings constructed from user profiles as input data, as many recommender systems do. We expect that including user profiles would improve prediction performance.
AIOct 4, 2022
Estimating the hardness of SAT encodings for Logical Equivalence Checking of Boolean circuitsAlexander Semenov, Konstantin Chukharev, Egor Tarasov et al.
In this paper we investigate how to estimate the hardness of Boolean satisfiability (SAT) encodings for the Logical Equivalence Checking problem (LEC). Meaningful estimates of hardness are important in cases when a conventional SAT solver cannot solve a SAT instance in a reasonable time. We show that the hardness of SAT encodings for LEC instances can be estimated \textit{w.r.t.} some SAT partitioning. We also demonstrate the dependence of the accuracy of the resulting estimates on the probabilistic characteristics of a specially defined random variable associated with the considered partitioning. The paper proposes several methods for constructing partitionings, which, when used in practice, allow one to estimate the hardness of SAT encodings for LEC with good accuracy. In the experimental part we propose a class of scalable LEC tests that give extremely complex instances with a relatively small input size $n$ of the considered circuits. For example, for $n = 40$, none of the state-of-the-art SAT solvers can cope with the considered tests in a reasonable time. However, these tests can be solved in parallel using the proposed partitioning methods.
AIFeb 19
Efficient Parallel Algorithm for Decomposing Hard CircuitSAT InstancesVictor Kondratiev, Irina Gribanova, Alexander Semenov
We propose a novel parallel algorithm for decomposing hard CircuitSAT instances. The technique employs specialized constraints to partition an original SAT instance into a family of weakened formulas. Our approach is implemented as a parameterized parallel algorithm, where adjusting the parameters allows efficient identification of high-quality decompositions, guided by hardness estimations computed in parallel. We demonstrate the algorithm's practical efficacy on challenging CircuitSAT instances, including those encoding Logical Equivalence Checking of Boolean circuits and preimage attacks on cryptographic hash functions.
AIDec 16, 2023
Decomposing Hard SAT Instances with Metaheuristic OptimizationDaniil Chivilikhin, Artem Pavlenko, Alexander Semenov
In the article, within the framework of the Boolean Satisfiability problem (SAT), the problem of estimating the hardness of specific Boolean formulas w.r.t. a specific complete SAT solving algorithm is considered. Based on the well-known Strong Backdoor Set (SBS) concept, we introduce the notion of decomposition hardness (d-hardness). If $B$ is an arbitrary subset of the set of variables occurring in a SAT formula $C$, and $A$ is an arbitrary complete SAT solver , then the d-hardness expresses an estimate of the hardness of $C$ w.r.t. $A$ and $B$. We show that the d-hardness of $C$ w.r.t. a particular $B$ can be expressed in terms of the expected value of a special random variable associated with $A$, $B$, and $C$. For its computational evaluation, algorithms based on the Monte Carlo method can be used. The problem of finding $B$ with the minimum value of d-hardness is formulated as an optimization problem for a pseudo-Boolean function whose values are calculated as a result of a probabilistic experiment. To minimize this function, we use evolutionary algorithms. In the experimental part, we demonstrate the applicability of the concept of d-hardness and the methods of its estimation to solving hard unsatisfiable SAT instances.
DMDec 6, 2021
Multidimensional Assignment Problem for multipartite entity resolutionAlla Kammerdiner, Alexander Semenov, Eduardo Pasiliao
Multipartite entity resolution aims at integrating records from multiple datasets into one entity. We derive a mathematical formulation for a general class of record linkage problems in multipartite entity resolution across many datasets as a combinatorial optimization problem known as the multidimensional assignment problem. As a motivation for our approach, we illustrate the advantage of multipartite entity resolution over sequential bipartite matching. Because the optimization problem is NP-hard, we apply two heuristic procedures, a Greedy algorithm and very large scale neighborhood search, to solve the assignment problem and find the most likely matching of records from multiple datasets into a single entity. We evaluate and compare the performance of these algorithms and their modifications on synthetically generated data. We perform computational experiments to compare performance of recent heuristic, the very large-scale neighborhood search, with a Greedy algorithm, another heuristic for the MAP, as well as with two versions of genetic algorithm, a general metaheuristic. Importantly, we perform experiments to compare two alternative methods of re-starting the search for the former heuristic, specifically a random-sampling multi-start and a deterministic design-based multi-start. We find evidence that design-based multi-start can be more efficient as the size of databases grow large. In addition, we show that very large scale search, especially its multi-start version, outperforms simple Greedy heuristic. Hybridization of Greedy search with very large scale neighborhood search improves the performance. Using multi-start with as few as three additional runs of very large scale search offers some improvement in the performance of the very large scale search procedure. Last, we propose an approach to evaluating complexity of the very large-scale neighborhood search.
CRJan 9, 2019
An Integer Programming Formulation of the Key Management Problem in Wireless Sensor NetworksGuanglin Xu, Alexander Semenov, Maciej Rysz
With the advent of modern communications systems, much attention has been put on developing methods for securely transferring information between constituents of wireless sensor networks. To this effect, we introduce a mathematical programming formulation for the key management problem, which broadly serves as a mechanism for encrypting communications. In particular, an integer programming model of the q-Composite scheme is proposed and utilized to distribute keys among nodes of a network whose topology is known. Numerical experiments demonstrating the effectiveness of the proposed model are conducted using using a well-known optimization solver package. An illustrative example depicting an optimal encryption for a small-scale network is also presented.
LOMay 17, 2018
Translation of Algorithmic Descriptions of Discrete Functions to SAT with Applications to Cryptanalysis ProblemsAlexander Semenov, Ilya Otpuschennikov, Irina Gribanova et al.
In the present paper, we propose a technology for translating algorithmic descriptions of discrete functions to SAT. The proposed technology is aimed at applications in algebraic cryptanalysis. We describe how cryptanalysis problems are reduced to SAT in such a way that it should be perceived as natural by the cryptographic community. In~the theoretical part of the paper we justify the main principles of general reduction to SAT for discrete functions from a class containing the majority of functions employed in cryptography. Then, we describe the Transalg software tool developed based on these principles with SAT-based cryptanalysis specifics in mind. We demonstrate the results of applications of Transalg to construction of a number of attacks on various cryptographic functions. Some of the corresponding attacks are state of the art. We compare the functional capabilities of the proposed tool with that of other domain-specific software tools which can be used to reduce cryptanalysis problems to SAT, and also with the CBMC system widely employed in symbolic verification. The paper also presents vast experimental data, obtained using the SAT solvers that took first places at the SAT competitions in the recent several years.
AIMar 13, 2018
On Cryptographic Attacks Using Backdoors for SATAlexander Semenov, Oleg Zaikin, Ilya Otpuschennikov et al.
Propositional satisfiability (SAT) is at the nucleus of state-of-the-art approaches to a variety of computationally hard problems, one of which is cryptanalysis. Moreover, a number of practical applications of SAT can only be tackled efficiently by identifying and exploiting a subset of formula's variables called backdoor set (or simply backdoors). This paper proposes a new class of backdoor sets for SAT used in the context of cryptographic attacks, namely guess-and-determine attacks. The idea is to identify the best set of backdoor variables subject to a statistically estimated hardness of the guess-and-determine attack using a SAT solver. Experimental results on weakened variants of the renowned encryption algorithms exhibit advantage of the proposed approach compared to the state of the art in terms of the estimated hardness of the resulting guess-and-determine attacks.
AIFeb 20, 2018
Using Automatic Generation of Relaxation Constraints to Improve the Preimage Attack on 39-step MD4Irina Gribanova, Alexander Semenov
In this paper we construct preimage attack on the truncated variant of the MD4 hash function. Specifically, we study the MD4-39 function defined by the first 39 steps of the MD4 algorithm. We suggest a new attack on MD4-39, which develops the ideas proposed by H. Dobbertin in 1998. Namely, the special relaxation constraints are introduced in order to simplify the equations corresponding to the problem of finding a preimage for an arbitrary MD4-39 hash value. The equations supplemented with the relaxation constraints are then reduced to the Boolean Satisfiability Problem (SAT) and solved using the state-of-the-art SAT solvers. We show that the effectiveness of a set of relaxation constraints can be evaluated using the black-box function of a special kind. Thus, we suggest automatic method of relaxation constraints generation by applying the black-box optimization to this function. The proposed method made it possible to find new relaxation constraints that contribute to a SAT-based preimage attack on MD4-39 which significantly outperforms the competition.
SIFeb 27, 2017
Multimodal Clustering for Community DetectionDmitry I. Ignatov, Alexander Semenov, Daria Komissarova et al.
Multimodal clustering is an unsupervised technique for mining interesting patterns in $n$-adic binary relations or $n$-mode networks. Among different types of such generalized patterns one can find biclusters and formal concepts (maximal bicliques) for 2-mode case, triclusters and triconcepts for 3-mode case, closed $n$-sets for $n$-mode case, etc. Object-attribute biclustering (OA-biclustering) for mining large binary datatables (formal contexts or 2-mode networks) arose by the end of the last decade due to intractability of computation problems related to formal concepts; this type of patterns was proposed as a meaningful and scalable approximation of formal concepts. In this paper, our aim is to present recent advance in OA-biclustering and its extensions to mining multi-mode communities in SNA setting. We also discuss connection between clustering coefficients known in SNA community for 1-mode and 2-mode networks and OA-bicluster density, the main quality measure of an OA-bicluster. Our experiments with 2-, 3-, and 4-mode large real-world networks show that this type of patterns is suitable for community detection in multi-mode cases within reasonable time even though the number of corresponding $n$-cliques is still unknown due to computation difficulties. An interpretation of OA-biclusters for 1-mode networks is provided as well.
AIJul 4, 2016
Encoding Cryptographic Functions to SAT Using Transalg SystemIlya Otpuschennikov, Alexander Semenov, Irina Gribanova et al.
In this paper we propose the technology for constructing propositional encodings of discrete functions. It is aimed at solving inversion problems of considered functions using state-of-the-art SAT solvers. We implemented this technology in the form of the software system called Transalg, and used it to construct SAT encodings for a number of cryptanalysis problems. By applying SAT solvers to these encodings we managed to invert several cryptographic functions. In particular, we used the SAT encodings produced by Transalg to construct the family of two-block MD5 collisions in which the first 10 bytes are zeros. Also we used Transalg encoding for the widely known A5/1 keystream generator to solve several dozen of its cryptanalysis instances in a distributed computing environment. In the paper we compare in detail the functionality of Transalg with that of similar software systems.
AIJul 3, 2015
Using Monte Carlo method for searching partitionings of hard variants of Boolean satisfiability problemAlexander Semenov, Oleg Zaikin
In this paper we propose the approach for constructing partitionings of hard variants of the Boolean satisfiability problem (SAT). Such partitionings can be used for solving corresponding SAT instances in parallel. For the same SAT instance one can construct different partitionings, each of them is a set of simplified versions of the original SAT instance. The effectiveness of an arbitrary partitioning is determined by the total time of solving of all SAT instances from it. We suggest the approach, based on the Monte Carlo method, for estimating time of processing of an arbitrary partitioning. With each partitioning we associate a point in the special finite search space. The estimation of effectiveness of the particular partitioning is the value of predictive function in the corresponding point of this space. The problem of search for an effective partitioning can be formulated as a problem of optimization of the predictive function. We use metaheuristic algorithms (simulated annealing and tabu search) to move from point to point in the search space. In our computational experiments we found partitionings for SAT instances encoding problems of inversion of some cryptographic functions. Several of these SAT instances with realistic predicted solving time were successfully solved on a computing cluster and in the volunteer computing project SAT@home. The solving time agrees well with estimations obtained by the proposed method.
DCNov 20, 2014
Using Volunteer Computing for Mounting SAT-based Cryptographic AttacksAlexander Semenov, Oleg Zaikin, Ilya Otpuschennikov
In this paper we describe the volunteer computing project SAT@home, developed and maintained by us. This project is aimed at solving hard instances of the Boolean satisfiability problem (SAT). We believe that this project can be a useful tool for computational study of inversion problems of some cryptographic functions. In particular we describe a series of experiments performed in SAT@home on the cryptanalysis of the widely known keystream generator A5/1. In all experiments we analyzed one known burst (114 bits) of keystream produced by A5/1. Before the cryptanalysis itself there is a stage on which the partitioning of the original problem to a family of subproblems is carried out. Each of subproblems should be easy enough so that it could be solved in relatively small amount of time by volunteer's PC. We construct such partitioning using the special technique based on the Monte Carlo method and discrete optimization algorithms for special predictive functions. Besides this in the paper we describe the technique for reducing inversion problems of cryptographic functions to SAT.
SIOct 29, 2014
Using synchronous Boolean networks to model several phenomena of collective behaviorStepan Kochemazov, Alexander Semenov
In this paper, we propose an approach for modeling and analysis of a number of phenomena of collective behavior. By collectives we mean multi-agent systems that transition from one state to another at discrete moments of time. The behavior of a member of a collective (agent) is called conforming if the opinion of this agent at current time moment conforms to the opinion of some other agents at the previous time moment. We presume that at each moment of time every agent makes a decision by choosing from the set {0,1} (where 1-decision corresponds to action and 0-decision corresponds to inaction). In our approach we model collective behavior with synchronous Boolean networks. We presume that in a network there can be agents that act at every moment of time. Such agents are called instigators. Also there can be agents that never act. Such agents are called loyalists. Agents that are neither instigators nor loyalists are called simple agents. We study two combinatorial problems. The first problem is to find a disposition of instigators that in several time moments transforms a network from a state where a majority of simple agents are inactive to a state with a majority of active agents. The second problem is to find a disposition of loyalists that returns the network to a state with a majority of inactive agents. Similar problems are studied for networks in which simple agents demonstrate the contrary to conforming behavior that we call anticonforming. We obtained several theoretical results regarding the behavior of collectives of agents with conforming or anticonforming behavior. In computational experiments we solved the described problems for randomly generated networks with several hundred vertices. We reduced corresponding combinatorial problems to the Boolean satisfiability problem (SAT) and used modern SAT solvers to solve the instances obtained.
AIMay 7, 2014
Transalg: a Tool for Translating Procedural Descriptions of Discrete Functions to SATIlya Otpuschennikov, Alexander Semenov, Stepan Kochemazov
In this paper we present the Transalg system, designed to produce SAT encodings for discrete functions, written as programs in a specific language. Translation of such programs to SAT is based on propositional encoding methods for formal computing models and on the concept of symbolic execution. We used the Transalg system to make SAT encodings for a number of cryptographic functions.
AIAug 4, 2013
On estimating total time to solve SAT in distributed computing environments: Application to the SAT@home projectAlexander Semenov, Oleg Zaikin
This paper proposes a method to estimate the total time required to solve SAT in distributed environments via partitioning approach. It is based on the observation that for some simple forms of problem partitioning one can use the Monte Carlo approach to estimate the time required to solve an original problem. The method proposed is based on an algorithm for searching for partitioning with an optimal solving time estimation. We applied this method to estimate the time required to perform logical cryptanalysis of the widely known stream ciphers A5/1 and Bivium. The paper also describes a volunteer computing project SAT@home aimed at solving hard combinatorial problems reduced to SAT. In this project during several months there were solved 10 problems of logical cryptanalysis of the A5/1 cipher thatcould not be solved using known rainbow tables.