CVJul 6, 2022Code
Identifying and Mitigating Flaws of Deep Perceptual Similarity MetricsOskar Sjögren, Gustav Grund Pihlgren, Fredrik Sandin et al.
Measuring the similarity of images is a fundamental problem to computer vision for which no universal solution exists. While simple metrics such as the pixel-wise L2-norm have been shown to have significant flaws, they remain popular. One group of recent state-of-the-art metrics that mitigates some of those flaws are Deep Perceptual Similarity (DPS) metrics, where the similarity is evaluated as the distance in the deep features of neural networks. However, DPS metrics themselves have been less thoroughly examined for their benefits and, especially, their flaws. This work investigates the most common DPS metric, where deep features are compared by spatial position, along with metrics comparing the averaged and sorted deep features. The metrics are analyzed in-depth to understand the strengths and weaknesses of the metrics by using images designed specifically to challenge them. This work contributes with new insights into the flaws of DPS, and further suggests improvements to the metrics. An implementation of this work is available online: https://github.com/guspih/deep_perceptual_similarity_analysis/
CVApr 5, 2023Code
Deep Perceptual Similarity is Adaptable to Ambiguous ContextsGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
The concept of image similarity is ambiguous, and images can be similar in one context and not in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of deep perceptual similarity (DPS) metrics to adapt to a given context. DPS metrics use the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific similarity contexts. No single metric can suit all similarity contexts, and previous rule-based metrics are labor-intensive to rewrite for new contexts. On the other hand, DPS metrics use neural networks that might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training ImageNet pretrained CNNs to measure similarity according to given contexts. Contexts are created by randomly ranking six image distortions. Distortions later in the ranking are considered more disruptive to similarity when applied to an image for that context. This also gives insight into whether the pretrained features capture different similarity contexts. The adapted metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online: https://github.com/LTU-Machine-Learning/Analysis-of-Deep-Perceptual-Loss-Networks
CVFeb 8, 2023
A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning ConventionsGustav Grund Pihlgren, Konstantina Nikolaidou, Prakash Chandra Chhipa et al.
In recent years, deep perceptual loss has been widely and successfully used to train machine learning models for many computer vision tasks, including image synthesis, segmentation, and autoencoding. Deep perceptual loss is a type of loss function for images that computes the error between two images as the distance between deep features extracted from a neural network. Most applications of the loss use pretrained networks called loss networks for deep feature extraction. However, despite increasingly widespread use, the effects of loss network implementation on the trained models have not been studied. This work rectifies this through a systematic evaluation of the effect of different pretrained loss networks on four different application areas. Specifically, the work evaluates 14 different pretrained architectures with four different feature extraction layers. The evaluation reveals that VGG networks without batch normalization have the best performance and that the choice of feature extraction layer is at least as important as the choice of architecture. The analysis also reveals that deep perceptual loss does not adhere to the transfer learning conventions that better ImageNet accuracy implies better downstream performance and that feature extraction from the later layers provides better performance.
LGAug 10, 2023
ReLU and Addition-based Gated RNNRickard Brännvall, Henrik Forsgren, Fredrik Sandin et al.
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.
CVMar 16, 2020Code
Pretraining Image Encoders without Reconstruction via Feature Prediction LossGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
This work investigates three methods for calculating loss for autoencoder-based pretraining of image encoders: The commonly used reconstruction loss, the more recently introduced deep perceptual similarity loss, and a feature prediction loss proposed here; the latter turning out to be the most efficient choice. Standard auto-encoder pretraining for deep learning tasks is done by comparing the input image and the reconstructed image. Recent work shows that predictions based on embeddings generated by image autoencoders can be improved by training with perceptual loss, i.e., by adding a loss network after the decoding step. So far the autoencoders trained with loss networks implemented an explicit comparison of the original and reconstructed images using the loss network. However, given such a loss network we show that there is no need for the time-consuming task of decoding the entire image. Instead, we propose to decode the features of the loss network, hence the name "feature prediction loss". To evaluate this method we perform experiments on three standard publicly available datasets (LunarLander-v2, STL-10, and SVHN) and compare six different procedures for training image encoders (pixel-wise, perceptual similarity, and feature prediction losses; combined with two variations of image and feature encoding/decoding). The embedding-based prediction results show that encoders trained with feature prediction loss is as good or better than those trained with the other two losses. Additionally, the encoder is significantly faster to train using feature prediction loss in comparison to the other losses. The method implementation used in this work is available online: https://github.com/guspih/Perceptual-Autoencoders
CVJan 10, 2020Code
Improving Image Autoencoder Embeddings with Perceptual LossGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
Autoencoders are commonly trained using element-wise loss. However, element-wise loss disregards high-level structures in the image which can lead to embeddings that disregard them as well. A recent improvement to autoencoders that helps alleviate this problem is the use of perceptual loss. This work investigates perceptual loss from the perspective of encoder embeddings themselves. Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss based on a pretrained model as well as pixel-wise loss. A host of different predictors are trained to perform object positioning and classification on the datasets given the embedded images as input. The two kinds of losses are evaluated by comparing how the predictors performed with embeddings from the differently trained autoencoders. The results show that, in the image domain, the embeddings generated by autoencoders trained with perceptual loss enable more accurate predictions than those trained with element-wise loss. Furthermore, the results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10. The experimental setup is available online: https://github.com/guspih/Perceptual-Autoencoders
LGJun 10, 2025
Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented GenerationKarl Löwenmark, Daniel Strömbergsson, Chang Liu et al.
Condition monitoring (CM) plays a crucial role in ensuring reliability and efficiency in the process industry. Although computerised maintenance systems effectively detect and classify faults, tasks like fault severity estimation, and maintenance decisions still largely depend on human expert analysis. The analysis and decision making automatically performed by current systems typically exhibit considerable uncertainty and high false alarm rates, leading to increased workload and reduced efficiency. This work integrates large language model (LLM)-based reasoning agents with CM workflows to address analyst and industry needs, namely reducing false alarms, enhancing fault severity estimation, improving decision support, and offering explainable interfaces. We propose MindRAG, a modular framework combining multimodal retrieval-augmented generation (RAG) with novel vector store structures designed specifically for CM data. The framework leverages existing annotations and maintenance work orders as surrogates for labels in a supervised learning protocol, addressing the common challenge of training predictive models on unlabelled and noisy real-world datasets. The primary contributions include: (1) an approach for structuring industry CM data into a semi-structured multimodal vector store compatible with LLM-driven workflows; (2) developing multimodal RAG techniques tailored for CM data; (3) developing practical reasoning agents capable of addressing real-world CM queries; and (4) presenting an experimental framework for integrating and evaluating such agents in realistic industrial scenarios. Preliminary results, evaluated with the help of an experienced analyst, indicate that MindRAG provide meaningful decision support for more efficient management of alarms, thereby improving the interpretability of CM systems.
HEP-EXFeb 18, 2025
Neuromorphic Readout for Hadron CalorimetersEnrico Lupi, Abhishek, Max Aehle et al.
We simulate hadrons impinging on a homogeneous lead-tungstate (PbWO4) calorimeter to investigate how the resulting light yield and its temporal structure, as detected by an array of light-sensitive sensors, can be processed by a neuromorphic computing system. Our model encodes temporal photon distributions as spike trains and employs a fully connected spiking neural network to estimate the total deposited energy, as well as the position and spatial distribution of the light emissions within the sensitive material. The extracted primitives offer valuable topological information about the shower development in the material, achieved without requiring a segmentation of the active medium. A potential nanophotonic implementation using III-V semiconductor nanowires is discussed. It can be both fast and energy efficient.
HEP-EXFeb 10, 2025
Unsupervised Particle Tracking with Neuromorphic ComputingEmanuele Coradin, Fabio Cufino, Muhammad Awais et al.
We study the application of a neural network architecture for identifying charged particle trajectories via unsupervised learning of delays and synaptic weights using a spike-time-dependent plasticity rule. In the considered model, the neurons receive time-encoded information on the position of particle hits in a tracking detector for a particle collider, modeled according to the geometry of the Compact Muon Solenoid Phase II detector. We show how a spiking neural network is capable of successfully identifying in a completely unsupervised way the signal left by charged particles in the presence of conspicuous noise from accidental or combinatorial hits. These results open the way to applications of neuromorphic computing to particle tracking, motivating further studies into its potential for real-time, low-power particle tracking in future high-energy physics experiments.
AIDec 11, 2021
Technical Language Supervision for Intelligent Fault Diagnosis in Process IndustryKarl Löwenmark, Cees Taal, Stephan Schnabel et al.
In the process industry, condition monitoring systems with automated fault diagnosis methods assist human experts and thereby improve maintenance efficiency, process sustainability, and workplace safety. Improving the automated fault diagnosis methods using data and machine learning-based models is a central aspect of intelligent fault diagnosis (IFD). A major challenge in IFD is to develop realistic datasets with accurate labels needed to train and validate models, and to transfer models trained with labeled lab data to heterogeneous process industry environments. However, fault descriptions and work-orders written by domain experts are increasingly digitised in modern condition monitoring systems, for example in the context of rotating equipment monitoring. Thus, domain-specific knowledge about fault characteristics and severities exists as technical language annotations in industrial datasets. Furthermore, recent advances in natural language processing enable weakly supervised model optimisation using natural language annotations, most notably in the form of natural language supervision (NLS). This creates a timely opportunity to develop technical language supervision (TLS) solutions for IFD systems grounded in industrial data, for example as a complement to pre-training with lab data to address problems like overfitting and inaccurate out-of-sample generalisation. We surveyed the literature and identify a considerable improvement in the maturity of NLS over the last two years, facilitating applications beyond natural language; a rapid development of weak supervision methods; and transfer learning as a current trend in IFD which can benefit from these developments. Finally we describe a general framework for TLS and implement a TLS case study based on SentenceBERT and contrastive learning based zero-shot inference on annotated industry data.
NEJun 10, 2021
Spatiotemporal Pattern Recognition in Single Mixed-Signal VLSI Neurons with Heterogeneous Dynamic SynapsesMattias Nilsson, Foteini Liwicki, Fredrik Sandin
Mixed-signal neuromorphic processors with brain-like organization and device physics offer an ultra-low-power alternative to the unsustainable developments of conventional deep learning and computing. However, realizing the potential of such neuromorphic hardware requires efficient use of its heterogeneous, analog neurosynaptic circuitry with neurocomputational methods for sparse, spike-timing-based encoding and processing. Here, we investigate the use of balanced excitatory-inhibitory disynaptic lateral connections as a resource-efficient mechanism for implementing a thalamocortically inspired Spatiotemporal Correlator (STC) neural network without using dedicated delay mechanisms. We present hardware-in-the-loop experiments with a DYNAP-SE neuromorphic processor, in which receptive fields of heterogeneous coincidence-detection neurons in an STC network with four lateral afferent connections per column were mapped by random input-sampling. Furthermore, we demonstrate how such a neuron was tuned to detect a particular spatiotemporal feature by discrete address-reprogramming of the analog synaptic circuits. The energy dissipation of the disynaptic connections is one order of magnitude lower per lateral connection (0.65 nJ vs 9.6 nJ per spike) than in the former delay-based hardware implementation of the STC.
NEFeb 12, 2020
Synaptic Integration of Spatiotemporal Features with a Dynamic Neuromorphic ProcessorMattias Nilsson, Foteini Liwicki, Fredrik Sandin
Spiking neurons can perform spatiotemporal feature detection by nonlinear synaptic and dendritic integration of presynaptic spike patterns. Multicompartment models of non-linear dendrites and related neuromorphic circuit designs enable faithful imitation of such dynamic integration processes, but these approaches are also associated with a relatively high computing cost or circuit size. Here, we investigate synaptic integration of spatiotemporal spike patterns with multiple dynamic synapses on point-neurons in the DYNAP-SE neuromorphic processor, which offers a complementary resource-efficient, albeit less flexible, approach to feature detection. We investigate how previously proposed excitatory--inhibitory pairs of dynamic synapses can be combined to integrate multiple inputs, and we generalize that concept to a case in which one inhibitory synapse is combined with multiple excitatory synapses. We characterize the resulting delayed excitatory postsynaptic potentials (EPSPs) by measuring and analyzing the membrane potentials of the neuromorphic neuronal circuits. We find that biologically relevant EPSP delays, with variability of order 10 milliseconds per neuron, can be realized in the proposed manner by selecting different synapse combinations, thanks to device mismatch. Based on these results, we demonstrate that a single point-neuron with dynamic synapses in the DYNAP-SE can respond selectively to presynaptic spikes with a particular spatiotemporal structure, which enables, for instance, visual feature tuning of single neurons.
NEJun 28, 2019
Synaptic Delays for Temporal Feature Detection in Dynamic Neuromorphic ProcessorsFredrik Sandin, Mattias Nilsson
Spiking neural networks implemented in dynamic neuromorphic processors are well suited for spatiotemporal feature detection and learning, for example in ultra low-power embedded intelligence and deep edge applications. Such pattern recognition networks naturally involve a combination of dynamic delay mechanisms and coincidence detection. Inspired by an auditory feature detection circuit in crickets, featuring a delayed excitation by postinhibitory rebound, we investigate disynaptic delay elements formed by inhibitory-excitatory pairs of dynamic synapses. We configure such disynaptic delay elements in the DYNAP-SE neuromorphic processor and characterize the distribution of delayed excitations resulting from device mismatch. Furthermore, we present a network that mimics the auditory feature detection circuit of crickets and demonstrate how varying synapse weights, input noise and processor temperature affects the circuit. Interestingly, we find that the disynaptic delay elements can be configured such that the timing and magnitude of the delayed postsynaptic excitation depend mainly on the efficacy of the inhibitory and excitatory synapses, respectively. Delay elements of this kind can be implemented in other reconfigurable dynamic neuromorphic processors and opens up for synapse level temporal feature tuning with large fan-in and flexible delays of order 10-100 ms.
LGMar 26, 2019
Interoperability and machine-to-machine translation model with mappings to machine learning tasksJacob Nilsson, Fredrik Sandin, Jerker Delsing
Modern large-scale automation systems integrate thousands to hundreds of thousands of physical sensors and actuators. Demands for more flexible reconfiguration of production systems and optimization across different information models, standards and legacy systems challenge current system interoperability concepts. Automatic semantic translation across information models and standards is an increasingly important problem that needs to be addressed to fulfill these demands in a cost-efficient manner under constraints of human capacity and resources in relation to timing requirements and system complexity. Here we define a translator-based operational interoperability model for interacting cyber-physical systems in mathematical terms, which includes system identification and ontology-based translation as special cases. We present alternative mathematical definitions of the translator learning task and mappings to similar machine learning tasks and solutions based on recent developments in machine learning. Possibilities to learn translators between artefacts without a common physical context, for example in simulations of digital twins and across layers of the automation pyramid are briefly discussed.
SPFeb 4, 2019
Dictionary learning approach to monitoring of wind turbine drivetrain bearingsSergio Martin-del-Campo, Fredrik Sandin, Daniel Strömbergsson
Condition monitoring is central to the efficient operation of wind farms due to the challenging operating conditions, rapid technology development and large number of aging wind turbines. In particular, predictive maintenance planning requires the early detection of faults with few false positives. Achieving this type of detection is a challenging problem due to the complex and weak signatures of some faults, particularly the faults that occur in some of the drivetrain bearings. Here, we investigate recently proposed condition monitoring methods based on unsupervised dictionary learning using vibration data recorded over 46 months under typical industrial operations. Thus, we contribute novel test results and real world data that are made publicly available. The results of former studies addressing condition monitoring tasks using dictionary learning indicate that unsupervised feature learning is useful for diagnosis and anomaly detection purposes. However, these studies are based on small sets of labeled data from test rigs operating under controlled conditions that focus on classification tasks, which are useful for quantitative method comparisons but gives little insight into how useful these approaches are in practice. In this study, dictionaries are learned from gearbox vibrations in six different turbines, and the dictionaries are subsequently propagated over a few years of monitoring data when faults are known to occur. We perform the experiment using two different sparse coding algorithms to investigate if the algorithm selected affects the features of abnormal conditions. We calculate the dictionary distance between the initial and propagated dictionaries and find the time periods of abnormal dictionary adaptation starting six months before a drivetrain bearing replacement and one year before the resulting gearbox replacement.
LGNov 28, 2016
Dictionary Learning with Equiprobable Matching PursuitFredrik Sandin, Sergio Martin-del-Campo
Sparse signal representations based on linear combinations of learned atoms have been used to obtain state-of-the-art results in several practical signal processing applications. Approximation methods are needed to process high-dimensional signals in this way because the problem to calculate optimal atoms for sparse coding is NP-hard. Here we study greedy algorithms for unsupervised learning of dictionaries of shift-invariant atoms and propose a new method where each atom is selected with the same probability on average, which corresponds to the homeostatic regulation of a recurrent convolutional neural network. Equiprobable selection can be used with several greedy algorithms for dictionary learning to ensure that all atoms adapt during training and that no particular atom is more likely to take part in the linear combination on average. We demonstrate via simulation experiments that dictionary learning with equiprobable selection results in higher entropy of the sparse representation and lower reconstruction and denoising errors, both in the case of ordinary matching pursuit and orthogonal matching pursuit with shift-invariant dictionaries. Furthermore, we show that the computational costs of the matching pursuits are lower with equiprobable selection, leading to faster and more accurate dictionary learning algorithms.
CVFeb 12, 2015
Towards zero-configuration condition monitoring based on dictionary learningSergio Martin-del-Campo, Fredrik Sandin
Condition-based predictive maintenance can significantly improve overall equipment effectiveness provided that appropriate monitoring methods are used. Online condition monitoring systems are customized to each type of machine and need to be reconfigured when conditions change, which is costly and requires expert knowledge. Basic feature extraction methods limited to signal distribution functions and spectra are commonly used, making it difficult to automatically analyze and compare machine conditions. In this paper, we investigate the possibility to automate the condition monitoring process by continuously learning a dictionary of optimized shift-invariant feature vectors using a well-known sparse approximation method. We study how the feature vectors learned from a vibration signal evolve over time when a fault develops within a ball bearing of a rotating machine. We quantify the adaptation rate of learned features and find that this quantity changes significantly in the transitions between normal and faulty states of operation of the ball bearing.