Rihui Wu

h-index31
2papers

2 Papers

CVNov 27, 2018Code
Sequentially Aggregated Convolutional Networks

Yiwen Huang, Rihui Wu, Pinglai Ou et al.

Modern deep networks generally implement a certain form of shortcut connections to alleviate optimization difficulties. However, we observe that such network topology alters the nature of deep networks. In many ways, these networks behave similarly to aggregated wide networks. We thus exploit the aggregation nature of shortcut connections at a finer architectural level and place them within wide convolutional layers. We end up with a sequentially aggregated convolutional (SeqConv) layer that combines the benefits of both wide and deep representations by aggregating features of various depths in sequence. The proposed SeqConv serves as a drop-in replacement of regular wide convolutional layers and thus could be handily integrated into any backbone network. We apply SeqConv to widely adopted backbones including ResNet and ResNeXt, and conduct experiments for image classification on public benchmark datasets. Our ResNet based network with a model size of ResNet-50 easily surpasses the performance of the 2.35$\times$ larger ResNet-152, while our ResNeXt based model sets a new state-of-the-art accuracy on ImageNet classification for networks with similar model complexity. The code and pre-trained models of our work are publicly available at https://github.com/GroupOfAlchemists/SeqConv.

CVAug 30, 2025
DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective

Yushuo Chen, Ruizhi Shao, Youxin Pang et al.

We present a novel framework to reconstruct human avatars from monocular videos. Recent approaches have struggled either to capture the fine-grained dynamic details from the input or to generate plausible details at novel viewpoints, which mainly stem from the limited representational capacity of the avatar model and insufficient observational data. To overcome these challenges, we propose to leverage the advanced video generative model, Human4DiT, to generate the human motions from alternative perspective as an additional supervision signal. This approach not only enriches the details in previously unseen regions but also effectively regularizes the avatar representation to mitigate artifacts. Furthermore, we introduce two complementary strategies to enhance video generation: To ensure consistent reproduction of human motion, we inject the physical identity into the model through video fine-tuning. For higher-resolution outputs with finer details, a patch-based denoising algorithm is employed. Experimental results demonstrate that our method outperforms recent state-of-the-art approaches and validate the effectiveness of our proposed strategies.