Andreas Wichert

h-index18

12papers

124citations

Novelty51%

AI Score25

Ranked #163,514 of 194,257 authors (top 84%)#9,946 in AI (top 79%)

12 Papers

6.6NEJul 11, 2022

Classification and Generation of real-world data with an Associative Memory Model

Rodrigo Simas, Luis Sa-Couto, Andreas Wichert

Drawing from memory the face of a friend you have not seen in years is a difficult task. However, if you happen to cross paths, you would easily recognize each other. The biological memory is equipped with an impressive compression algorithm that can store the essential, and then infer the details to match perception. The Willshaw Memory is a simple abstract model for cortical computations which implements mechanisms of biological memories. Using our recently proposed sparse coding prescription for visual patterns, this model can store and retrieve an impressive amount of real-world data in a fault-tolerant manner. In this paper, we extend the capabilities of the basic Associative Memory Model by using a Multiple-Modality framework. In this setting, the memory stores several modalities (e.g., visual, or textual) of each pattern simultaneously. After training, the memory can be used to infer missing modalities when just a subset is perceived. Using a simple encoder-memory-decoder architecture, and a newly proposed iterative retrieval algorithm for the Willshaw Model, we perform experiments on the MNIST dataset. By storing both the images and labels as modalities, a single Memory can be used not only to retrieve and complete patterns but also to classify and generate new ones. We further discuss how this model could be used for other learning tasks, thus serving as a biologically-inspired framework for learning.

4.6LGNov 18, 2022

Understanding the double descent curve in Machine Learning

Luis Sa-Couto, Jose Miguel Ramos, Miguel Almeida et al.

The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This led to the proposal of the double descent curve of performance by Belkin et al. Although it seems to describe a real, representative phenomenon, the field is lacking a fundamental theoretical understanding of what is happening, what are the consequences for model selection and when is double descent expected to occur. In this paper we develop a principled understanding of the phenomenon, and sketch answers to these important questions. Furthermore, we report real experimental results that are correctly predicted by our proposed hypothesis.

4.8NEJul 20, 2022

Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data?

Maria Osório, Luís Sa-Couto, Andreas Wichert

It is generally assumed that the brain uses something akin to sparse distributed representations. These representations, however, are high-dimensional and consequently they affect classification performance of traditional Machine Learning models due to "the curse of dimensionality". In tasks for which there is a vast amount of labeled data, Deep Networks seem to solve this issue with many layers and a non-Hebbian backpropagation algorithm. The brain, however, seems to be able to solve the problem with few layers. In this work, we hypothesize that this happens by using Hebbian learning. Actually, the Hebbian-like learning rule of Restricted Boltzmann Machines learns the input patterns asymmetrically. It exclusively learns the correlation between non-zero values and ignores the zeros, which represent the vast majority of the input dimensionality. By ignoring the zeros "the curse of dimensionality" problem can be avoided. To test our hypothesis, we generated several sparse datasets and compared the performance of a Restricted Boltzmann Machine classifier with some Backprop-trained networks. The experiments using these codes confirm our initial intuition as the Restricted Boltzmann Machine shows a good generalization performance, while the Neural Networks trained with the backpropagation algorithm overfit the training data.

1.8LGNov 25, 2022

The smooth output assumption, and why deep networks are better than wide ones

Luis Sa-Couto, Jose Miguel Ramos, Andreas Wichert

When several models have similar training scores, classical model selection heuristics follow Occam's razor and advise choosing the ones with least capacity. Yet, modern practice with large neural networks has often led to situations where two networks with exactly the same number of parameters score similar on the training set, but the deeper one generalizes better to unseen examples. With this in mind, it is well accepted that deep networks are superior to shallow wide ones. However, theoretically there is no difference between the two. In fact, they are both universal approximators. In this work we propose a new unsupervised measure that predicts how well a model will generalize. We call it the output sharpness, and it is based on the fact that, in reality, boundaries between concepts are generally unsharp. We test this new measure on several neural network settings, and architectures, and show how generally strong the correlation is between our metric, and test set performance. Having established this measure, we give a mathematical probabilistic argument that predicts network depth to be correlated with our proposed measure. After verifying this in real data, we are able to formulate the key argument of the work: output sharpness hampers generalization; deep networks have an in built bias against it; therefore, deep networks beat wide ones. All in all the work not only provides a helpful predictor of overfitting that can be used in practice for model selection (or even regularization), but also provides a much needed theoretical grounding for the success of modern deep neural networks.

1.8LGOct 26, 2022

Multi-level Data Representation For Training Deep Helmholtz Machines

Jose Miguel Ramos, Luis Sa-Couto, Andreas Wichert

A vast majority of the current research in the field of Machine Learning is done using algorithms with strong arguments pointing to their biological implausibility such as Backpropagation, deviating the field's focus from understanding its original organic inspiration to a compulsive search for optimal performance. Yet, there have been a few proposed models that respect most of the biological constraints present in the human brain and are valid candidates for mimicking some of its properties and mechanisms. In this paper, we will focus on guiding the learning of a biologically plausible generative model called the Helmholtz Machine in complex search spaces using a heuristic based on the Human Image Perception mechanism. We hypothesize that this model's learning algorithm is not fit for Deep Networks due to its Hebbian-like local update rule, rendering it incapable of taking full advantage of the compositional properties that multi-layer networks provide. We propose to overcome this problem, by providing the network's hidden layers with visual queues at different resolutions using a Multi-level Data representation. The results on several image datasets showed the model was able to not only obtain better overall quality but also a wider diversity in the generated images, corroborating our intuition that using our proposed heuristic allows the model to take more advantage of the network's depth growth. More importantly, they show the unexplored possibilities underlying brain-inspired models and techniques.

2.6CVApr 30, 2021

Using brain inspired principles to unsupervisedly learn good representations for visual pattern recognition

Luis Sa-Couto, Andreas Wichert

Although deep learning has solved difficult problems in visual pattern recognition, it is mostly successful in tasks where there are lots of labeled training data available. Furthermore, the global back-propagation based training rule and the amount of employed layers represents a departure from biological inspiration. The brain is able to perform most of these tasks in a very general way from limited to no labeled data. For these reasons it is still a key research question to look into computational principles in the brain that can help guide models to unsupervisedly learn good representations which can then be used to perform tasks like classification. In this work we explore some of these principles to generate such representations for the MNIST data set. We compare the obtained results with similar recent works and verify extremely competitive results.

3.3LGFeb 21, 2020

An Investigation of Interpretability Techniques for Deep Learning in Predictive Process Analytics

Catarina Moreira, Renuka Sindhgatta, Chun Ouyang et al.

This paper explores interpretability techniques for two of the most successful learning algorithms in medical decision-making literature: deep neural networks and random forests. We applied these algorithms in a real-world medical dataset containing information about patients with cancer, where we learn models that try to predict the type of cancer of the patient, given their set of medical activity records. We explored different algorithms based on neural network architectures using long short term deep neural networks, and random forests. Since there is a growing need to provide decision-makers understandings about the logic of predictions of black boxes, we also explored different techniques that provide interpretations for these classifiers. In one of the techniques, we intercepted some hidden layers of these neural networks and used autoencoders in order to learn what is the representation of the input in the hidden layers. In another, we investigated an interpretable model locally around the random forest's prediction. Results show learning an interpretable model locally around the model's prediction leads to a higher understanding of why the algorithm is making some decision. Use of local and linear model helps identify the features used in prediction of a specific instance or data point. We see certain distinct features used for predictions that provide useful insights about the type of cancer, along with features that do not generalize well. In addition, the structured deep learning approach using autoencoders provided meaningful prediction insights, which resulted in the identification of nonlinear clusters correspondent to the patients' different types of cancer.

3.1AIJul 16, 2018

Introducing Quantum-Like Influence Diagrams for Violations of the Sure Thing Principle

Catarina Moreira, Andreas Wichert

It is the focus of this work to extend and study the previously proposed quantum-like Bayesian networks to deal with decision-making scenarios by incorporating the notion of maximum expected utility in influence diagrams. The general idea is to take advantage of the quantum interference terms produced in the quantum-like Bayesian Network to influence the probabilities used to compute the expected utility of some action. This way, we are not proposing a new type of expected utility hypothesis. On the contrary, we are keeping it under its classical definition. We are only incorporating it as an extension of a probabilistic graphical model in a compact graphical representation called an influence diagram in which the utility function depends on the probabilistic influences of the quantum-like Bayesian network. Our findings suggest that the proposed quantum-like influence digram can indeed take advantage of the quantum interference effects of quantum-like Bayesian Networks to maximise the utility of a cooperative behaviour in detriment of a fully rational defect behaviour under the prisoner's dilemma game.

1.7AIOct 2, 2017

The Dutch's Real World Financial Institute: Introducing Quantum-Like Bayesian Networks as an Alternative Model to deal with Uncertainty

Catarina Moreira, Emmanuel Haven, Sandro Sozzo et al.

In this work, we analyse and model a real life financial loan application belonging to a sample bank in the Netherlands. The log is robust in terms of data, containing a total of 262 200 event logs, belonging to 13 087 different credit applications. The dataset is heterogeneous and consists of a mixture of computer generated automatic processes and manual human tasks. The goal is to work out a decision model, which represents the underlying tasks that make up the loan application service, and to assess potential areas of improvement of the institution's internal processes. To this end we study the impact of incomplete event logs for the extraction and analysis of business processes. It is quite common that event logs are incomplete with several amounts of missing information (for instance, workers forget to register their tasks). Absence of data is translated into a drastic decrease of precision and compromises the decision models, leading to biased and unrepresentative results. We investigate how classical probabilistic models are affected by incomplete event logs and we explore quantum-like probabilistic inferences as an alternative mathematical model to classical probability. This work represents a first step towards systematic investigation of the impact of quantum interference in a real life large scale decision scenario. The results obtained in this study indicate that, under high levels of uncertainty, the quantum-like models generate quantum interference terms, which allow an additional non-linear parameterisation of the data. Experimental results attest the efficiency of the quantum-like Bayesian networks, since the application of interference terms is able to reduce the error percentage of inferences performed over quantum-like models when compared to inferences produced by classical models.

5.1AIAug 26, 2015

The Relation Between Acausality and Interference in Quantum-Like Bayesian Networks

Catarina Moreira, Andreas Wichert

We analyse a quantum-like Bayesian Network that puts together cause/effect relationships and semantic similarities between events. These semantic similarities constitute acausal connections according to the Synchronicity principle and provide new relationships to quantum like probabilistic graphical models. As a consequence, beliefs (or any other event) can be represented in vector spaces, in which quantum parameters are determined by the similarities that these vectors share between them. Events attached by a semantic meaning do not need to have an explanation in terms of cause and effect.

14.9AISep 30, 2014

Interference Effects in Quantum Belief Networks

Catarina Moreira, Andreas Wichert

Probabilistic graphical models such as Bayesian Networks are one of the most powerful structures known by the Computer Science community for deriving probabilistic inferences. However, modern cognitive psychology has revealed that human decisions could not follow the rules of classical probability theory, because humans cannot process large amounts of data in order to make judgements. Consequently, the inferences performed are based on limited data coupled with several heuristics, leading to violations of the law of total probability. This means that probabilistic graphical models based on classical probability theory are too limited to fully simulate and explain various aspects of human decision making. Quantum probability theory was developed in order to accommodate the paradoxical findings that the classical theory could not explain. Recent findings in cognitive psychology revealed that quantum probability can fully describe human decisions in an elegant framework. Their findings suggest that, before taking a decision, human thoughts are seen as superposed waves that can interfere with each other, influencing the final decision. In this work, we propose a new Bayesian Network based on the psychological findings of cognitive scientists. We made experiments with two very well known Bayesian Networks from the literature. The results obtained revealed that the quantum like Bayesian Network can affect drastically the probabilistic inferences, specially when the levels of uncertainty of the network are very high (no pieces of evidence observed). When the levels of uncertainty are very low, then the proposed quantum like network collapses to its classical counterpart.

5.6AIJun 12, 2013

Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy

Catarina Moreira, Andreas Wichert

Expert finding is an information retrieval task concerned with the search for the most knowledgeable people, in some topic, with basis on documents describing peoples activities. The task involves taking a user query as input and returning a list of people sorted by their level of expertise regarding the user query. This paper introduces a novel approach for combining multiple estimators of expertise based on a multisensor data fusion framework together with the Dempster-Shafer theory of evidence and Shannon's entropy. More specifically, we defined three sensors which detect heterogeneous information derived from the textual contents, from the graph structure of the citation patterns for the community of experts, and from profile information about the academic experts. Given the evidences collected, each sensor may define different candidates as experts and consequently do not agree in a final ranking decision. To deal with these conflicts, we applied the Dempster-Shafer theory of evidence combined with Shannon's Entropy formula to fuse this information and come up with a more accurate and reliable final ranking list. Experiments made over two datasets of academic publications from the Computer Science domain attest for the adequacy of the proposed approach over the traditional state of the art approaches. We also made experiments against representative supervised state of the art algorithms. Results revealed that the proposed method achieved a similar performance when compared to these supervised techniques, confirming the capabilities of the proposed framework.