LGFeb 20, 2023
Efficient Generator of Mathematical Expressions for Symbolic RegressionSebastian Mežnar, Sašo Džeroski, Ljupčo Todorovski
We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. It combines simple atomic units with shared weights to recursively encode and decode the individual nodes in the hierarchy. Encoding is performed bottom-up and decoding top-down. We empirically show that HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space. The latter can be efficiently explored with various optimization methods to address the task of symbolic regression. Indeed, random search through the latent space of HVAE performs better than random search through expressions generated by manually crafted probabilistic grammars for mathematical expressions. Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.ž
LGSep 8, 2024
ICML Topological Deep Learning Challenge 2024: Beyond the Graph DomainGuillermo Bernárdez, Lev Telyatnikov, Marco Montagna et al.
This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.
AIAug 21, 2024
Quantifying Behavioral Dissimilarity Between Mathematical ExpressionsSebastian Mežnar, Sašo Džeroski, Ljupčo Todorovski
Quantifying the similarity between mathematical expressions is a fundamental problem in computational mathematics, symbolic reasoning, and scientific discovery. While behavioral notions of similarity have previously been explored in the context of software and program analysis, existing measures for mathematical expressions rely primarily on syntactic form, assessing similarity through symbolic structure rather than actual behavior. Yet syntactically distinct expressions can exhibit nearly identical outputs, while structurally similar ones may behave very differently-especially when the expressions contain free parameters that define families of functions. To address these limitations, we introduce Behavior-aware Expression Dissimilarity (BED), a principled framework for quantifying behavioral distance between mathematical expressions with free parameters. BED represents expressions as joint probability distributions over their input-output pairs and applies the Wasserstein distance to measure behavioral dissimilarity. A computationally efficient stochastic approximation is proposed and shown to be consistent, robust, and capable of inducing a smoother, more meaningful structure over the space of expressions than syntax-based measures. The approach provides a foundation for behavior-based comparison, clustering, and learning of mathematical expressions, with potential direct applications in equation discovery, symbolic regression, and neuro-symbolic modeling.
LGNov 23, 2021
Link Analysis meets Ontologies: Are Embeddings the Answer?Sebastian Mežnar, Matej Bevec, Nada Lavrač et al.
The increasing amounts of semantic resources offer valuable storage of human knowledge; however, the probability of wrong entries increases with the increased size. The development of approaches that identify potentially spurious parts of a given knowledge base is thus becoming an increasingly important area of interest. In this work, we present a systematic evaluation of whether structure-only link analysis methods can already offer a scalable means to detecting possible anomalies, as well as potentially interesting novel relation candidates. Evaluating thirteen methods on eight different semantic resources, including Gene Ontology, Food Ontology, Marine Ontology and similar, we demonstrated that structure-only link analysis could offer scalable anomaly detection for a subset of the data sets. Further, we demonstrated that by considering symbolic node embedding, explanations of the predictions (links) could be obtained, making this branch of methods potentially more valuable than the black-box only ones. To our knowledge, this is currently one of the most extensive systematic studies of the applicability of different types of link analysis methods across semantic resources from different domains.
SIMar 31, 2021
Transfer Learning for Node Regression Applied to Spreading PredictionSebastian Mežnar, Nada Lavrač, Blaž Škrlj
Understanding how information propagates in real-life complex networks yields a better understanding of dynamic processes such as misinformation or epidemic spreading. The recently introduced branch of machine learning methods for learning node representations offers many novel applications, one of them being the task of spreading prediction addressed in this paper. We explore the utility of the state-of-the-art node representation learners when used to assess the effects of spreading from a given node, estimated via extensive simulations. Further, as many real-life networks are topologically similar, we systematically investigate whether the learned models generalize to previously unseen networks, showing that in some cases very good model transfer can be obtained. This work is one of the first to explore transferability of the learned representations for the task of node regression; we show there exist pairs of networks with similar structure between which the trained models can be transferred (zero-shot), and demonstrate their competitive performance. To our knowledge, this is one of the first attempts to evaluate the utility of zero-shot transfer for the task of node regression.
LGDec 16, 2020
Predicting Generalization in Deep Learning via Metric Learning -- PGDL Shared taskSebastian Mežnar, Blaž Škrlj
The competition "Predicting Generalization in Deep Learning (PGDL)" aims to provide a platform for rigorous study of generalization of deep learning models and offer insight into the progress of understanding and explaining these models. This report presents the solution that was submitted by the user \emph{smeznar} which achieved the eight place in the competition. In the proposed approach, we create simple metrics and find their best combination with automatic testing on the provided dataset, exploring how combinations of various properties of the input neural network architectures can be used for the prediction of their generalization.
LGSep 8, 2020
SNoRe: Scalable Unsupervised Learning of Symbolic Node RepresentationsSebastian Mežnar, Nada Lavrač, Blaž Škrlj
Learning from complex real-life networks is a lively research area, with recent advances in learning information-rich, low-dimensional network node representations. However, state-of-the-art methods are not necessarily interpretable and are therefore not fully applicable to sensitive settings in biomedical or user profiling tasks, where explicit bias detection is highly relevant. The proposed SNoRe (Symbolic Node Representations) algorithm is capable of learning symbolic, human-understandable representations of individual network nodes, based on the similarity of neighborhood hashes which serve as features. SNoRe's interpretable features are suitable for direct explanation of individual predictions, which we demonstrate by coupling it with the widely used instance explanation tool SHAP to obtain nomograms representing the relevance of individual features for a given classification. To our knowledge, this is one of the first such attempts in a structural node embedding setting. In the experimental evaluation on eleven real-life datasets, SNoRe proved to be competitive to strong baselines, such as variational graph autoencoders, node2vec and LINE. The vectorized implementation of SNoRe scales to large networks, making it suitable for contemporary network learning and analysis tasks.