CVApr 27, 2024Code
A Method of Moments Embedding Constraint and its Application to Semi-Supervised LearningMichael Majurski, Sumeet Menon, Parniyan Farvardin et al.
Discriminative deep learning models with a linear+softmax final layer have a problem: the latent space only predicts the conditional probabilities $p(Y|X)$ but not the full joint distribution $p(Y,X)$, which necessitates a generative approach. The conditional probability cannot detect outliers, causing outlier sensitivity in softmax networks. This exacerbates model over-confidence impacting many problems, such as hallucinations, confounding biases, and dependence on large datasets. To address this we introduce a novel embedding constraint based on the Method of Moments (MoM). We investigate the use of polynomial moments ranging from 1st through 4th order hyper-covariance matrices. Furthermore, we use this embedding constraint to train an Axis-Aligned Gaussian Mixture Model (AAGMM) final layer, which learns not only the conditional, but also the joint distribution of the latent space. We apply this method to the domain of semi-supervised image classification by extending FlexMatch with our technique. We find our MoM constraint with the AAGMM layer is able to match the reported FlexMatch accuracy, while also modeling the joint distribution, thereby reducing outlier sensitivity. We also present a preliminary outlier detection strategy based on Mahalanobis distance and discuss future improvements to this strategy. Code is available at: \url{https://github.com/mmajurski/ssl-gmm}
CVMar 26
End-to-end Feature Alignment: A Simple CNN with Intrinsic Class AttributionParniyan Farvardin, David Chapman
We present Feature-Align CNN (FA-CNN), a prototype CNN architecture with intrinsic class attribution through end-to-end feature alignment. Our intuition is that the use of unordered operations such as Linear and Conv2D layers cause unnecessary shuffling and mixing of semantic concepts, thereby making raw feature maps difficult to understand. We introduce two new order preserving layers, the dampened skip connection, and the global average pooling classifier head. These layers force the model to maintain an end-to-end feature alignment from the raw input pixels all the way to final class logits. This end-to-end alignment enhances the interpretability of the model by allowing the raw feature maps to intrinsically exhibit class attribution. We prove theoretically that FA-CNN penultimate feature maps are identical to Grad-CAM saliency maps. Moreover, we prove that these feature maps slowly morph layer-by-layer over network depth, showing the evolution of features through network depth toward penultimate class activations. FA-CNN performs well on benchmark image classification datasets. Moreover, we compare the averaged FA-CNN raw feature maps against Grad-CAM and permutation methods in a percent pixels removed interpretability task. We conclude this work with a discussion and future, including limitations and extensions toward hybrid models.
CVNov 7, 2024
Interpretable Estimation of CNN Deep Feature Density using Copula and the Generalized Characteristic FunctionDavid Chapman, Parniyan Farvardin
We present a novel empirical approach toward estimating the Probability Density Function (PDF) of the deep features of Convolutional Neural Networks (CNNs). Estimating the PDF of deep CNN features is an important task, because it will yield new insight into deep representations. Moreover, characterizing the statistical behavior has implications for the feasibility of promising downstream tasks such as density based anomaly detection. Expressive, yet interpretable estimation of the deep feature PDF is challenging due to the Curse of Dimensionality (CoD) as well as our limited ability to comprehend high-dimensional inter-dependencies. Our novel estimation technique combines copula analysis with the Method of Orthogonal Moments (MOM), in order to directly estimate the Generalized Characteristic Function (GCF) of the multivariate deep feature PDF. We find that the one-dimensional marginals of non-negative deep CNN features after major blocks are not well approximated by a Gaussian distribution, and that the features of deep layers are much better approximated by the Exponential, Gamma, and/or Weibull distributions. Furthermore, we observe that deep features become increasingly long-tailed with network depth, although surprisingly the rate of this increase is much slower than theoretical estimates. Finally, we observe that many deep features exhibit strong dependence (either correlation or anti-correlation) with other extremely strong detections, even if these features are independent within typical ranges. We elaborate on these findings in our discussion, where we hypothesize that the long-tail of large valued features corresponds to the strongest computer vision detections of semantic targets, which would imply that these large-valued features are not outliers but rather an important detection signal.