CVMar 7, 2022Code
Explaining Classifiers by Constructing Familiar ConceptsJohannes Schneider, Michail Vlachos
Interpreting a large number of neurons in deep learning is difficult. Our proposed `CLAssifier-DECoder' architecture (ClaDec) facilitates the understanding of the output of an arbitrary layer of neurons or subsets thereof. It uses a decoder that transforms the incomprehensible representation of the given neurons to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information (or concepts) a layer maintains by contrasting reconstructed images of ClaDec with those of a conventional auto-encoder(AE) serving as reference. An extension of ClaDec allows trading comprehensibility and fidelity. We evaluate our approach for image classification using convolutional neural networks. We show that reconstructed visualizations using encodings from a classifier capture more relevant classification information than conventional AEs. This holds although AEs contain more information on the original input. Our user study highlights that even non-experts can identify a diverse set of concepts contained in images that are relevant (or irrelevant) for the classifier. We also compare against saliency based methods that focus on pixel relevance rather than concepts. We show that ClaDec tends to highlight more relevant input areas to classification though outcomes depend on classifier architecture. Code is at \url{https://github.com/JohnTailor/ClaDec}
LGSep 6, 2019
Personalization of Deep LearningJohannes Schneider, Michail Vlachos
We discuss training techniques, objectives and metrics toward personalization of deep learning models. In machine learning, personalization addresses the goal of a trained model to target a particular individual by optimizing one or more performance metrics, while conforming to certain constraints. To personalize, we investigate three methods of ``curriculum learning`` and two approaches for data grouping, i.e., augmenting the data of an individual by adding similar data identified with an auto-encoder. We show that both ``curriculuum learning'' and ``personalized'' data augmentation lead to improved performance on data of an individual. Mostly, this comes at the cost of reduced performance on a more general, broader dataset.
IRNov 20, 2017
Linear-Complexity Relaxed Word Mover's Distance with GPU AccelerationKubilay Atasu, Thomas Parnell, Celestine Dünner et al.
The amount of unstructured text-based data is growing every day. Querying, clustering, and classifying this big data requires similarity computations across large sets of documents. Whereas low-complexity similarity metrics are available, attention has been shifting towards more complex methods that achieve a higher accuracy. In particular, the Word Mover's Distance (WMD) method proposed by Kusner et al. is a promising new approach, but its time complexity grows cubically with the number of unique words in the documents. The Relaxed Word Mover's Distance (RWMD) method, again proposed by Kusner et al., reduces the time complexity from qubic to quadratic and results in a limited loss in accuracy compared with WMD. Our work contributes a low-complexity implementation of the RWMD that reduces the average time complexity to linear when operating on large sets of documents. Our linear-complexity RWMD implementation, henceforth referred to as LC-RWMD, maps well onto GPUs and can be efficiently distributed across a cluster of GPUs. Our experiments on real-life datasets demonstrate 1) a performance improvement of two orders of magnitude with respect to our GPU-based distributed implementation of the quadratic RWMD, and 2) a performance improvement of three to four orders of magnitude with respect to our distributed WMD implementation that uses GPU-based RWMD for pruning.
IRApr 7, 2016
Scalable and interpretable product recommendations via overlapping co-clusteringReinhard Heckel, Michail Vlachos, Thomas Parnell et al.
We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs).
MLMay 22, 2014
Compressive Mining: Fast and Optimal Data Mining in the Compressed DomainMichail Vlachos, Nikolaos Freris, Anastasios Kyrillidis
Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an \emph{exact} solution to the problem. The suggested solution provides the tightest estimation of the $L_2$-norm or the correlation. We show that typical data-analysis operations, such as k-NN search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.
STMar 30, 2014
Approximate Matrix Multiplication with Application to Linear EmbeddingsAnastasios Kyrillidis, Michail Vlachos, Anastasios Zouzias
In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace -- on which we project the input matrices -- has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.
IRJan 22, 2014
On Randomly Projected Hierarchical Clustering with GuaranteesJohannes Schneider, Michail Vlachos
Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average (ALC) linkage clustering as well as for the minimum spanning tree problem (MST). We present a thorough adaptive analysis of our algorithms that improve prior work from $O(N^2)$ by up to a factor of $N/(\log N)^2$ for a dataset of $N$ points in Euclidean space. The algorithms maintain, with arbitrary high probability, the outcome of hierarchical clustering as well as the worst-case running-time guarantees. We also present parameter-free instances of our algorithms.