Stella Yu

10papers

4,152citations

Novelty59%

AI Score31

Ranked #142,956 of 201,326 authors (top 71%)#44,669 in CV (top 76%)

10 Papers

LGNov 30, 2021

CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data

Yunhui Guo, Haoran Guo, Stella Yu

Hyperbolic space can naturally embed hierarchies that often exist in real-world data and semantics. While high-dimensional hyperbolic embeddings lead to better representations, most hyperbolic models utilize low-dimensional embeddings, due to non-trivial optimization and visualization of high-dimensional hyperbolic data. We propose CO-SNE, which extends the Euclidean space visualization tool, t-SNE, to hyperbolic space. Like t-SNE, it converts distances between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of high-dimensional data $X$ and low-dimensional embedding $Y$. However, unlike Euclidean space, hyperbolic space is inhomogeneous: A volume could contain a lot more points at a location far from the origin. CO-SNE thus uses hyperbolic normal distributions for $X$ and hyperbolic \underline{C}auchy instead of t-SNE's Student's t-distribution for $Y$, and it additionally seeks to preserve $X$'s individual distances to the \underline{O}rigin in $Y$. We apply CO-SNE to naturally hyperbolic data and supervisedly learned hyperbolic features. Our results demonstrate that CO-SNE deflates high-dimensional hyperbolic data into a low-dimensional space without losing their hyperbolic characteristics, significantly outperforming popular visualization tools such as PCA, t-SNE, UMAP, and HoroPCA which is also designed for hyperbolic data.

IVAug 27, 2021

High Fidelity Deep Learning-based MRI Reconstruction with Instance-wise Discriminative Feature Matching Loss

Ke Wang, Jonathan I Tamir, Alfredo De Goyeneche et al.

Purpose: To improve reconstruction fidelity of fine structures and textures in deep learning (DL) based reconstructions. Methods: A novel patch-based Unsupervised Feature Loss (UFLoss) is proposed and incorporated into the training of DL-based reconstruction frameworks in order to preserve perceptual similarity and high-order statistics. The UFLoss provides instance-level discrimination by mapping similar instances to similar low-dimensional feature vectors and is trained without any human annotation. By adding an additional loss function on the low-dimensional feature space during training, the reconstruction frameworks from under-sampled or corrupted data can reproduce more realistic images that are closer to the original with finer textures, sharper edges, and improved overall image quality. The performance of the proposed UFLoss is demonstrated on unrolled networks for accelerated 2D and 3D knee MRI reconstruction with retrospective under-sampling. Quantitative metrics including NRMSE, SSIM, and our proposed UFLoss were used to evaluate the performance of the proposed method and compare it with others. Results: In-vivo experiments indicate that adding the UFLoss encourages sharper edges and more faithful contrasts compared to traditional and learning-based methods with pure l2 loss. More detailed textures can be seen in both 2D and 3D knee MR images. Quantitative results indicate that reconstruction with UFLoss can provide comparable NRMSE and a higher SSIM while achieving a much lower UFLoss value. Conclusion: We present UFLoss, a patch-based unsupervised learned feature loss, which allows the training of DL-based reconstruction to obtain more detailed texture, finer features, and sharper edges with higher overall image quality under DL-based reconstruction frameworks.

CVOct 22, 2020

Unsupervised deep learning for grading of age-related macular degeneration using retinal fundus images

Baladitya Yellapragada, Sascha Hornhauer, Kiersten Snyder et al.

Many diseases are classified based on human-defined rubrics that are prone to bias. Supervised neural networks can automate the grading of retinal fundus images, but require labor-intensive annotations and are restricted to the specific trained task. Here, we employed an unsupervised network with Non-Parametric Instance Discrimination (NPID) to grade age-related macular degeneration (AMD) severity using fundus photographs from the Age-Related Eye Disease Study (AREDS). Our unsupervised algorithm demonstrated versatility across different AMD classification schemes without retraining, and achieved unbalanced accuracies comparable to supervised networks and human ophthalmologists in classifying advanced or referable AMD, or on the 4-step AMD severity scale. Exploring the networks behavior revealed disease-related fundus features that drove predictions and unveiled the susceptibility of more granular human-defined AMD severity schemes to misclassification by both ophthalmologists and neural networks. Importantly, unsupervised learning enabled unbiased, data-driven discovery of AMD features such as geographic atrophy, as well as other ocular phenotypes of the choroid, vitreous, and lens, such as visually-impairing cataracts, that were not pre-defined by human labels.

LGJun 22, 2020

C-SURE: Shrinkage Estimator and Prototype Classifier for Complex-Valued Deep Learning

Yifei Xing, Rudrasis Chakraborty, Minxuan Duan et al.

The James-Stein (JS) shrinkage estimator is a biased estimator that captures the mean of Gaussian random vectors.While it has a desirable statistical property of dominance over the maximum likelihood estimator (MLE) in terms of mean squared error (MSE), not much progress has been made on extending the estimator onto manifold-valued data. We propose C-SURE, a novel Stein's unbiased risk estimate (SURE) of the JS estimator on the manifold of complex-valued data with a theoretically proven optimum over MLE. Adapting the architecture of the complex-valued SurReal classifier, we further incorporate C-SURE into a prototype convolutional neural network (CNN) classifier. We compare C-SURE with SurReal and a real-valued baseline on complex-valued MSTAR and RadioML datasets. C-SURE is more accurate and robust than SurReal, and the shrinkage estimator is always better than MLE for the same prototype classifier. Like SurReal, C-SURE is much smaller, outperforming the real-valued baseline on MSTAR (RadioML) with less than 1 percent (3 percent) of the baseline size

CVJun 14, 2020

BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

Jesper Haahr Christensen, Sascha Hornauer, Stella Yu

Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both quantitative and qualitative improvements over our previous BatVision model.

CVDec 15, 2019

BatVision: Learning to See 3D Spatial Layout with Two Ears

Jesper Haahr Christensen, Sascha Hornauer, Stella Yu

Many species have evolved advanced non-visual perception while artificial systems fall behind. Radar and ultrasound complement camera-based vision but they are often too costly and complex to set up for very limited information gain. In nature, sound is used effectively by bats, dolphins, whales, and humans for navigation and communication. However, it is unclear how to best harness sound for machine perception. Inspired by bats' echolocation mechanism, we design a low-cost BatVision system that is capable of seeing the 3D spatial layout of space ahead by just listening with two ears. Our system emits short chirps from a speaker and records returning echoes through microphones in an artificial human pinnae pair. During training, we additionally use a stereo camera to capture color images for calculating scene depths. We train a model to predict depth maps and even grayscale images from the sound alone. During testing, our trained BatVision provides surprisingly good predictions of 2D visual scenes from two 1D audio signals. Such a sound to vision system would benefit robot navigation and machine vision, especially in low-light or no-light conditions. Our code and data are publicly available.

CVOct 18, 2019

SurReal: Complex-Valued Learning as Principled Transformations on a Scaling and Rotation Manifold

Rudrasis Chakraborty, Yifei Xing, Stella Yu

Complex-valued data is ubiquitous in signal and image processing applications, and complex-valued representations in deep learning have appealing theoretical properties. While these aspects have long been recognized, complex-valued deep learning continues to lag far behind its real-valued counterpart. We propose a principled geometric approach to complex-valued deep learning. Complex-valued data could often be subject to arbitrary complex-valued scaling; as a result, real and imaginary components could co-vary. Instead of treating complex values as two independent channels of real values, we recognize their underlying geometry: We model the space of complex numbers as a product manifold of non-zero scaling and planar rotations. Arbitrary complex-valued scaling naturally becomes a group of transitive actions on this manifold. We propose to extend the property instead of the form of real-valued functions to the complex domain. We define convolution as weighted Fréchet mean on the manifold that is equivariant to the group of scaling/rotation actions, and define distance transform on the manifold that is invariant to the action group. The manifold perspective also allows us to define nonlinear activation functions such as tangent ReLU and G-transport, as well as residual connections on the manifold-valued data. We dub our model SurReal, as our experiments on MSTAR and RadioML deliver high performance with only a fractional size of real-valued and complex-valued baseline models.

CVSep 18, 2019

Unsupervised Sketch-to-Photo Synthesis

Runtao Liu, Qian Yu, Stella Yu

Humans can envision a realistic photo given a free-hand sketch that is not only spatially imprecise and geometrically distorted but also without colors and visual details. We study unsupervised sketch-to-photo synthesis for the first time, learning from unpaired sketch-photo data where the target photo for a sketch is unknown during training. Existing works only deal with style change or spatial deformation alone, synthesizing photos from edge-aligned line drawings or transforming shapes within the same modality, e.g., color images. Our key insight is to decompose unsupervised sketch-to-photo synthesis into a two-stage translation task: First shape translation from sketches to grayscale photos and then content enrichment from grayscale to color photos. We also incorporate a self-supervised denoising objective and an attention module to handle abstraction and style variations that are inherent and specific to sketches. Our synthesis is sketch-faithful and photo-realistic to enable sketch-based image retrieval in practice. An exciting corollary product is a universal and promising sketch generator that captures human visual perception beyond the edge map of a photo.

CVMay 31, 2018

Semantic Analysis of (Reflectional) Visual Symmetry: A Human-Centred Computational Model for Declarative Explainability

Jakob Suchan, Mehul Bhatt, Srikrishna Vardarajan et al.

We present a computational model for the semantic interpretation of symmetry in naturalistic scenes. Key features include a human-centred representation, and a declarative, explainable interpretation model supporting deep semantic question-answering founded on an integration of methods in knowledge representation and deep learning based computer vision. In the backdrop of the visual arts, we showcase the framework's capability to generate human-centred, queryable, relational structures, also evaluating the framework with an empirical study on the human perception of visual symmetry. Our framework represents and is driven by the application of foundational, integrated Vision and Knowledge Representation and Reasoning methods for applications in the arts, and the psychological and social sciences.

CVMay 5, 2018

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

Zhirong Wu, Yuanjun Xiong, Stella Yu et al.

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances? We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin. Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.