CVMar 21, 2022Code
Depth Completion using Geometry-Aware EmbeddingWenchao Du, Hu Chen, Hongyu Yang et al.
Exploiting internal spatial geometric constraints of sparse LiDARs is beneficial to depth completion, however, has been not explored well. This paper proposes an efficient method to learn geometry-aware embedding, which encodes the local and global geometric structure information from 3D points, e.g., scene layout, object's sizes and shapes, to guide dense depth estimation. Specifically, we utilize the dynamic graph representation to model generalized geometric relationship from irregular point clouds in a flexible and efficient manner. Further, we joint this embedding and corresponded RGB appearance information to infer missing depths of the scene with well structure-preserved details. The key to our method is to integrate implicit 3D geometric representation into a 2D learning architecture, which leads to a better trade-off between the performance and efficiency. Extensive experiments demonstrate that the proposed method outperforms previous works and could reconstruct fine depths with crisp boundaries in regions that are over-smoothed by them. The ablation study gives more insights into our method that could achieve significant gains with a simple design, while having better generalization capability and stability. The code is available at https://github.com/Wenchao-Du/GAENet.
20.7CVMay 26Code
Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive LearningQida Tan, Hongyu Yang, Wenchao Du
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts. In this work, we devise a simple yet effective semi-supervised learning architecture that leverages unlabeled data to enhance domain generalization, thereby reducing reliance on labor-intensive manual annotations. Our key insight is to impose Jacobian regularization to disentangle feature representations into discriminative subspaces dedicated to specific gaze components, such as pitch and yaw angles. We further exploit the intrinsic ordinal ranking within each subspace for contrastive learning, enabling the model to learn robust gaze representations from a small set of labeled samples and an abundance of unlabeled ones. This ultimately yields our Disentangled Subspace Contrastive Learning (DSCL) framework. Extensive experiments on multiple benchmarks verify that the proposed DSCL is plug-and-play, achieving competitive performance using only 20\%, 10\%, and even 5\% of the annotated data under both in-domain and cross-domain evaluation settings. The public code is available at \href{https://github.com/da60266/DSCL}{https://github.com/da60266/DSCL}.
CVJul 22, 2023
Fast and Stable Diffusion Inverse Solver with History Gradient UpdateLinchao He, Hongyu Yan, Mengting Luo et al.
Diffusion models have recently been recognised as efficient inverse problem solvers due to their ability to produce high-quality reconstruction results without relying on pairwise data training. Existing diffusion-based solvers utilize Gradient Descent strategy to get a optimal sample solution. However, these solvers only calculate the current gradient and have not utilized any history information of sampling process, thus resulting in unstable optimization progresses and suboptimal solutions. To address this issue, we propose to utilize the history information of the diffusion-based inverse solvers. In this paper, we first prove that, in previous work, using the gradient descent method to optimize the data fidelity term is convergent. Building on this, we introduce the incorporation of historical gradients into this optimization process, termed History Gradient Update (HGU). We also provide theoretical evidence that HGU ensures the convergence of the entire algorithm. It's worth noting that HGU is applicable to both pixel-based and latent-based diffusion model solvers. Experimental results demonstrate that, compared to previous sampling algorithms, sampling algorithms with HGU achieves state-of-the-art results in medical image reconstruction, surpassing even supervised learning methods. Additionally, it achieves competitive results on natural images.
CVJan 31, 2023
Hierarchical Disentangled Representation for Invertible Image Denoising and BeyondWenchao Du, Hu Chen, Yi Zhang et al.
Image denoising is a typical ill-posed problem due to complex degradation. Leading methods based on normalizing flows have tried to solve this problem with an invertible transformation instead of a deterministic mapping. However, the implicit bijective mapping is not explored well. Inspired by a latent observation that noise tends to appear in the high-frequency part of the image, we propose a fully invertible denoising method that injects the idea of disentangled learning into a general invertible neural network to split noise from the high-frequency part. More specifically, we decompose the noisy image into clean low-frequency and hybrid high-frequency parts with an invertible transformation and then disentangle case-specific noise and high-frequency components in the latent space. In this way, denoising is made tractable by inversely merging noiseless low and high-frequency parts. Furthermore, we construct a flexible hierarchical disentangling framework, which aims to decompose most of the low-frequency image information while disentangling noise from the high-frequency part in a coarse-to-fine manner. Extensive experiments on real image denoising, JPEG compressed artifact removal, and medical low-dose CT image restoration have demonstrated that the proposed method achieves competing performance on both quantitative metrics and visual quality, with significantly less computational cost.
15.2CVApr 13
Progressively Texture-Aware Diffusion for Contrast-Enhanced Sparse-View CTTianqi Wang, Wenchao Du, Hongyu Yang
Diffusion-based sparse-view CT (SVCT) imaging has achieved remarkable advancements in recent years, thanks to its more stable generative capability. However, recovering reliable image content and visually consistent textures is still a crucial challenge. In this paper, we present a Progressively Texture-aware Diffusion (PTD) model, a coarse-to-fine learning framework tailored for SVCT. Specifically, PTD comprises a basic reconstructive module PTD$_{\textit{rec}}$ and a conditional diffusion module PTD$_{\textit{diff}}$. PTD$_{\textit{rec}}$ first learns a deterministic mapping to recover the majority of the underlying low-frequency signals (i.e., coarse content with smoothed textures), which serves as the initial estimation to enable fidelity. Moreover, PTD$_{\textit{diff}}$ aims to reconstruct high-fidelity details for coarse prediction, which explores a dual-domain guided conditional diffusion to generate reliable and consistent textures. Extensive experiments on sparse-view CT reconstruction demonstrate that our PTD achieves superior performance in terms of structure similarity and visual appeal with only a few sampling steps, which mitigates the randomness inherent in general diffusion models and enables a better trade-off between visual quality and fidelity of high-frequency details.
CVJul 16, 2024
Affective Behavior Analysis using Task-adaptive and AU-assisted Graph NetworkXiaodong Li, Wenchao Du, Hongyu Yang
In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feature representations, we introduce the pre-trained large model Dinov2. 2) To adaptively extract the required features of eack task, we design a task-adaptive block that performs cross-attention between a set of learnable query vectors and pre-extracted features. 3) By proposing the AU-assisted Graph Convolutional Network(AU-GCN), we make full use of the correlation information between AUs to assist in solving the EXPR and VA tasks. Finally, we achieve the evaluation measure of \textbf{1.2542} on the validation set provided by the organizers.
CVNov 17, 2025Code
Hybrid-Domain Adaptative Representation Learning for Gaze EstimationQida Tan, Hongyu Yang, Wenchao Du
Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of $\textbf{5.02}^{\circ}$ and $\textbf{3.36}^{\circ}$, and $\textbf{9.26}^{\circ}$ respectively, and present competitive performances through cross-dataset evaluation. The code is available at https://github.com/da60266/HARL.
CVMar 28, 2020
Learning Invariant Representation for Unsupervised Image RestorationWenchao Du, Hu Chen, Hongyu Yang
Recently, cross domain transfer has been applied for unsupervised image restoration tasks. However, directly applying existing frameworks would lead to domain-shift problems in translated images due to lack of effective supervision. Instead, we propose an unsupervised learning method that explicitly learns invariant presentation from noisy data and reconstructs clear observations. To do so, we introduce discrete disentangling representation and adversarial domain adaption into general domain transfer framework, aided by extra self-supervised modules including background and semantic consistency constraints, learning robust representation under dual domain constraints, such as feature and image domains. Experiments on synthetic and real noise removal tasks show the proposed method achieves comparable performance with other state-of-the-art supervised and unsupervised methods, while having faster and stable convergence than other domain adaption methods.
MED-PHOct 31, 2018
Visual Attention Network for Low Dose CTWenchao Du, Hu Chen, Peixi Liao et al.
Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition, and will significantly affect the imaging performance. Perfect noise removal and image restoration is intractable in the context of LDCT due to the statistical and technical uncertainties. In this paper, we apply the generative adversarial network (GAN) framework with a visual attention mechanism to deal with this problem in a data-driven/machine learning fashion. Our main idea is to inject visual attention knowledge into the learning process of GAN to provide a powerful prior of the noise distribution. By doing this, both the generator and discriminator networks are empowered with visual attention information so they will not only pay special attention to noisy regions and surrounding structures but also explicitly assess the local consistency of the recovered regions. Our experiments qualitatively and quantitatively demonstrate the effectiveness of the proposed method with clinic CT images.
CLSep 3, 2018
Data Augmentation for Neural Online Chat Response SelectionWenchao Du, Alan W Black
Data augmentation seeks to manipulate the available data for training to improve the generalization ability of models. We investigate two data augmentation proxies, permutation and flipping, for neural dialog response selection task on various models over multiple datasets, including both Chinese and English languages. Different from standard data augmentation techniques, our method combines the original and synthesized data for prediction. Empirical results show that our approach can gain 1 to 3 recall-at-1 points over baseline models in both full-scale and small-scale settings.
CLDec 8, 2016
Discovering Conversational Dependencies between Messages in DialogsWenchao Du, Pascal Poupart, Wei Xu
We investigate the task of inferring conversational dependencies between messages in one-on-one online chat, which has become one of the most popular forms of customer service. We propose a novel probabilistic classifier that leverages conversational, lexical and semantic information. The approach is evaluated empirically on a set of customer service chat logs from a Chinese e-commerce website. It outperforms heuristic baselines.