IVMar 15, 2022Code
Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological ImagesPrakash Chandra Chhipa, Richa Upadhyay, Gustav Grund Pihlgren et al.
This work presents a novel self-supervised pre-training method to learn efficient representations without labels on histopathology medical images utilizing magnification factors. Other state-of-theart works mainly focus on fully supervised learning approaches that rely heavily on human annotations. However, the scarcity of labeled and unlabeled data is a long-standing challenge in histopathology. Currently, representation learning without labels remains unexplored for the histopathology domain. The proposed method, Magnification Prior Contrastive Similarity (MPCS), enables self-supervised learning of representations without labels on small-scale breast cancer dataset BreakHis by exploiting magnification factor, inductive transfer, and reducing human prior. The proposed method matches fully supervised learning state-of-the-art performance in malignancy classification when only 20% of labels are used in fine-tuning and outperform previous works in fully supervised learning settings. It formulates a hypothesis and provides empirical evidence to support that reducing human-prior leads to efficient representation learning in self-supervision. The implementation of this work is available online on GitHub - https://github.com/prakashchhipa/Magnification-Prior-Self-Supervised-Method
CVJul 6, 2022Code
Identifying and Mitigating Flaws of Deep Perceptual Similarity MetricsOskar Sjögren, Gustav Grund Pihlgren, Fredrik Sandin et al.
Measuring the similarity of images is a fundamental problem to computer vision for which no universal solution exists. While simple metrics such as the pixel-wise L2-norm have been shown to have significant flaws, they remain popular. One group of recent state-of-the-art metrics that mitigates some of those flaws are Deep Perceptual Similarity (DPS) metrics, where the similarity is evaluated as the distance in the deep features of neural networks. However, DPS metrics themselves have been less thoroughly examined for their benefits and, especially, their flaws. This work investigates the most common DPS metric, where deep features are compared by spatial position, along with metrics comparing the averaged and sorted deep features. The metrics are analyzed in-depth to understand the strengths and weaknesses of the metrics by using images designed specifically to challenge them. This work contributes with new insights into the flaws of DPS, and further suggests improvements to the metrics. An implementation of this work is available online: https://github.com/guspih/deep_perceptual_similarity_analysis/
CVApr 5, 2023Code
Deep Perceptual Similarity is Adaptable to Ambiguous ContextsGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
The concept of image similarity is ambiguous, and images can be similar in one context and not in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of deep perceptual similarity (DPS) metrics to adapt to a given context. DPS metrics use the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific similarity contexts. No single metric can suit all similarity contexts, and previous rule-based metrics are labor-intensive to rewrite for new contexts. On the other hand, DPS metrics use neural networks that might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training ImageNet pretrained CNNs to measure similarity according to given contexts. Contexts are created by randomly ranking six image distortions. Distortions later in the ranking are considered more disruptive to similarity when applied to an image for that context. This also gives insight into whether the pretrained features capture different similarity contexts. The adapted metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online: https://github.com/LTU-Machine-Learning/Analysis-of-Deep-Perceptual-Loss-Networks
CVFeb 8, 2023
A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning ConventionsGustav Grund Pihlgren, Konstantina Nikolaidou, Prakash Chandra Chhipa et al.
In recent years, deep perceptual loss has been widely and successfully used to train machine learning models for many computer vision tasks, including image synthesis, segmentation, and autoencoding. Deep perceptual loss is a type of loss function for images that computes the error between two images as the distance between deep features extracted from a neural network. Most applications of the loss use pretrained networks called loss networks for deep feature extraction. However, despite increasingly widespread use, the effects of loss network implementation on the trained models have not been studied. This work rectifies this through a systematic evaluation of the effect of different pretrained loss networks on four different application areas. Specifically, the work evaluates 14 different pretrained architectures with four different feature extraction layers. The evaluation reveals that VGG networks without batch normalization have the best performance and that the choice of feature extraction layer is at least as important as the choice of architecture. The analysis also reveals that deep perceptual loss does not adhere to the transfer learning conventions that better ImageNet accuracy implies better downstream performance and that feature extraction from the later layers provides better performance.
CVSep 6, 2024Code
Segmentation and Smoothing Affect Explanation Quality More Than the Choice of Perturbation-based XAI Method for Image ExplanationsGustav Grund Pihlgren, Kary Främling
Perturbation-based post-hoc image explanation methods are commonly used to explain image prediction models. These methods perturb parts of the input to measure how those parts affect the output. Since the methods only require the input and output, they can be applied to any model, making them a popular choice to explain black-box models. While many different methods exist and have been compared with one another, it remains poorly understood which parameters of the different methods are responsible for their varying performance. This work uses the Randomized Input Sampling for Explanations (RISE) method as a baseline to evaluate many combinations of mask sampling, segmentation techniques, smoothing, attribution calculation, and per-segment or per-pixel attribution, using a proxy metric. The results show that attribution calculation, which is frequently the focus of other works, has little impact on the results. Conversely, segmentation and per-pixel attribution, rarely examined parameters, have a significant impact. The implementation of and data gathered in this work are available online: https://github.com/guspih/post-hoc-image-perturbation and https://bit.ly/smooth-mask-perturbation.
CVMar 16, 2020Code
Pretraining Image Encoders without Reconstruction via Feature Prediction LossGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
This work investigates three methods for calculating loss for autoencoder-based pretraining of image encoders: The commonly used reconstruction loss, the more recently introduced deep perceptual similarity loss, and a feature prediction loss proposed here; the latter turning out to be the most efficient choice. Standard auto-encoder pretraining for deep learning tasks is done by comparing the input image and the reconstructed image. Recent work shows that predictions based on embeddings generated by image autoencoders can be improved by training with perceptual loss, i.e., by adding a loss network after the decoding step. So far the autoencoders trained with loss networks implemented an explicit comparison of the original and reconstructed images using the loss network. However, given such a loss network we show that there is no need for the time-consuming task of decoding the entire image. Instead, we propose to decode the features of the loss network, hence the name "feature prediction loss". To evaluate this method we perform experiments on three standard publicly available datasets (LunarLander-v2, STL-10, and SVHN) and compare six different procedures for training image encoders (pixel-wise, perceptual similarity, and feature prediction losses; combined with two variations of image and feature encoding/decoding). The embedding-based prediction results show that encoders trained with feature prediction loss is as good or better than those trained with the other two losses. Additionally, the encoder is significantly faster to train using feature prediction loss in comparison to the other losses. The method implementation used in this work is available online: https://github.com/guspih/Perceptual-Autoencoders
CVJan 10, 2020Code
Improving Image Autoencoder Embeddings with Perceptual LossGustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki
Autoencoders are commonly trained using element-wise loss. However, element-wise loss disregards high-level structures in the image which can lead to embeddings that disregard them as well. A recent improvement to autoencoders that helps alleviate this problem is the use of perceptual loss. This work investigates perceptual loss from the perspective of encoder embeddings themselves. Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss based on a pretrained model as well as pixel-wise loss. A host of different predictors are trained to perform object positioning and classification on the datasets given the embedded images as input. The two kinds of losses are evaluated by comparing how the predictors performed with embeddings from the differently trained autoencoders. The results show that, in the image domain, the embeddings generated by autoencoders trained with perceptual loss enable more accurate predictions than those trained with element-wise loss. Furthermore, the results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10. The experimental setup is available online: https://github.com/guspih/Perceptual-Autoencoders
CLOct 16, 2018Code
Subword Semantic Hashing for Intent Classification on Small DatasetsKumar Shridhar, Ayushman Dash, Amit Sahu et al.
In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent
CVJun 15, 2021
VidHarm: A Clip Based Dataset for Harmful Content DetectionJohan Edstedt, Amanda Berg, Michael Felsberg et al.
Automatically identifying harmful content in video is an important task with a wide range of applications. However, there is a lack of professionally labeled open datasets available. In this work VidHarm, an open dataset of 3589 video clips from film trailers annotated by professionals, is presented. An analysis of the dataset is performed, revealing among other things the relation between clip and trailer level annotations. Audiovisual models are trained on the dataset and an in-depth study of modeling choices conducted. The results show that performance is greatly improved by combining the visual and audio modality, pre-training on large-scale video recognition datasets, and class balanced sampling. Lastly, biases of the trained models are investigated using discrimination probing. VidHarm is openly available, and further details are available at: https://vidharm.github.io