Yan Zuo

h-index8

7papers

39citations

Novelty55%

AI Score30

Ranked #134,853 of 194,257 authors (top 69%)#44,462 in CV (top 75%)

7 Papers

15.3CVFeb 29, 2024Code

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Xianghui Yang, Yan Zuo, Sameera Ramasinghe et al.

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple-view consistency. To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation, ensuring robust multi-view consistency during the novel-view generation process. Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning. Extensive experimental results demonstrate the effectiveness of ViewFusion in generating consistent and detailed novel views.

5.2CVJan 29, 2024

Divide and Conquer: Rethinking the Training Paradigm of Neural Radiance Fields

Rongkai Ma, Leo Lebrat, Rodrigo Santa Cruz et al.

Neural radiance fields (NeRFs) have exhibited potential in synthesizing high-fidelity views of 3D scenes but the standard training paradigm of NeRF presupposes an equal importance for each image in the training set. This assumption poses a significant challenge for rendering specific views presenting intricate geometries, thereby resulting in suboptimal performance. In this paper, we take a closer look at the implications of the current training paradigm and redesign this for more superior rendering quality by NeRFs. Dividing input views into multiple groups based on their visual similarities and training individual models on each of these groups enables each model to specialize on specific regions without sacrificing speed or efficiency. Subsequently, the knowledge of these specialized models is aggregated into a single entity via a teacher-student distillation paradigm, enabling spatial efficiency for online render-ing. Empirically, we evaluate our novel training framework on two publicly available datasets, namely NeRF synthetic and Tanks&Temples. Our evaluation demonstrates that our DaC training pipeline enhances the rendering quality of a state-of-the-art baseline model while exhibiting convergence to a superior minimum.

1.2CVNov 9, 2020

Localising In Complex Scenes Using Balanced Adversarial Adaptation

Gil Avraham, Yan Zuo, Tom Drummond

Domain adaptation and generative modelling have collectively mitigated the expensive nature of data collection and labelling by leveraging the rich abundance of accurate, labelled data in simulation environments. In this work, we study the performance gap that exists between representations optimised for localisation on simulation environments and the application of such representations in a real-world setting. Our method exploits the shared geometric similarities between simulation and real-world environments whilst maintaining invariance towards visual discrepancies. This is achieved by optimising a representation extractor to project both simulated and real representations into a shared representation space. Our method uses a symmetrical adversarial approach which encourages the representation extractor to conceal the domain that features are extracted from and simultaneously preserves robust attributes between source and target domains that are beneficial for localisation. We evaluate our method by adapting representations optimised for indoor Habitat simulated environments (Matterport3D and Replica) to a real-world indoor environment (Active Vision Dataset), showing that it compares favourably against fully-supervised approaches.

1.4MLNov 4, 2020

Residual Likelihood Forests

Yan Zuo, Tom Drummond

This paper presents a novel ensemble learning approach called Residual Likelihood Forests (RLF). Our weak learners produce conditional likelihoods that are sequentially optimized using global loss in the context of previous learners within a boosting-like framework (rather than probability distributions that are measured from observed data) and are combined multiplicatively (rather than additively). This increases the efficiency of our strong classifier, allowing for the design of classifiers which are more compact in terms of model capacity. We apply our method to several machine learning classification tasks, showing significant improvements in performance. When compared against several ensemble approaches including Random Forests and Gradient Boosted Trees, RLFs offer a significant improvement in performance whilst concurrently reducing the required model size.

9.4CVJul 31, 2019

EMPNet: Neural Localisation and Mapping Using Embedded Memory Points

Gil Avraham, Yan Zuo, Thanuja Dharmasiri et al.

Continuously estimating an agent's state space and a representation of its surroundings has proven vital towards full autonomy. A shared common ground among systems which successfully achieve this feat is the integration of previously encountered observations into the current state being estimated. This necessitates the use of a memory module for incorporating previously visited states whilst simultaneously offering an internal representation of the observed environment. In this work we develop a memory module which contains rigidly aligned point-embeddings that represent a coherent scene structure acquired from an RGB-D sequence of observations. The point-embeddings are extracted using modern convolutional neural network architectures, and alignment is performed by computing a dense correspondence matrix between a new observation and the current embeddings residing in the memory module. The whole framework is end-to-end trainable, resulting in a recurrent joint optimisation of the point-embeddings contained in the memory. This process amplifies the shared information across states, providing increased robustness and accuracy. We show significant improvement of our method across a set of experiments performed on the synthetic VIZDoom environment and a real world Active Vision Dataset.

1.7CVDec 6, 2018

Traversing Latent Space using Decision Ferns

Yan Zuo, Gil Avraham, Tom Drummond

The practice of transforming raw data to a feature space so that inference can be performed in that space has been popular for many years. Recently, rapid progress in deep neural networks has given both researchers and practitioners enhanced methods that increase the richness of feature representations, be it from images, text or speech. In this work we show how a constructed latent space can be explored in a controlled manner and argue that this complements well founded inference methods. For constructing the latent space a Variational Autoencoder is used. We present a novel controller module that allows for smooth traversal in the latent space and construct an end-to-end trainable framework. We explore the applicability of our method for performing spatial transformations as well as kinematics for predicting future latent vectors of a video sequence.

2.7MLMay 14, 2018

Generative Adversarial Forests for Better Conditioned Adversarial Learning

Yan Zuo, Gil Avraham, Tom Drummond

In recent times, many of the breakthroughs in various vision-related tasks have revolved around improving learning of deep models; these methods have ranged from network architectural improvements such as Residual Networks, to various forms of regularisation such as Batch Normalisation. In essence, many of these techniques revolve around better conditioning, allowing for deeper and deeper models to be successfully learned. In this paper, we look towards better conditioning Generative Adversarial Networks (GANs) in an unsupervised learning setting. Our method embeds the powerful discriminating capabilities of a decision forest into the discriminator of a GAN. This results in a better conditioned model which learns in an extremely stable way. We demonstrate empirical results which show both clear qualitative and quantitative evidence of the effectiveness of our approach, gaining significant performance improvements over several popular GAN-based approaches on the Oxford Flowers and Aligned Celebrity Faces datasets.