Mi Zhou

CV
h-index45
11papers
110citations
Novelty40%
AI Score42

11 Papers

SYApr 6, 2022
Deep transfer learning for system identification using long short-term memory neural networks

Kaicheng Niu, Mi Zhou, Chaouki T. Abdallah et al.

Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources.

GNMar 5, 2024Code
Bias in Generative AI

Mi Zhou, Vibhanshu Abhishek, Timothy Derdenger et al.

This study analyzed images generated by three popular generative artificial intelligence (AI) tools - Midjourney, Stable Diffusion, and DALLE 2 - representing various occupations to investigate potential bias in AI generators. Our analysis revealed two overarching areas of concern in these AI generators, including (1) systematic gender and racial biases, and (2) subtle biases in facial expressions and appearances. Firstly, we found that all three AI generators exhibited bias against women and African Americans. Moreover, we found that the evident gender and racial biases uncovered in our analysis were even more pronounced than the status quo when compared to labor force statistics or Google images, intensifying the harmful biases we are actively striving to rectify in our society. Secondly, our study uncovered more nuanced prejudices in the portrayal of emotions and appearances. For example, women were depicted as younger with more smiles and happiness, while men were depicted as older with more neutral expressions and anger, posing a risk that generative AI models may unintentionally depict women as more submissive and less competent than men. Such nuanced biases, by their less overt nature, might be more problematic as they can permeate perceptions unconsciously and may be more difficult to rectify. Although the extent of bias varied depending on the model, the direction of bias remained consistent in both commercial and open-source AI generators. As these tools become commonplace, our study highlights the urgency to identify and mitigate various biases in generative AI, reinforcing the commitment to ensuring that AI technologies benefit all of humanity for a more inclusive future.

78.5CVMar 17
Mixture of Style Experts for Diverse Image Stylization

Shihao Zhu, Ziheng Ouyang, Yijia Kang et al.

Diffusion-based stylization has advanced significantly, yet existing methods are limited to color-driven transformations, neglecting complex semantics and material details.We introduce StyleExpert, a semantic-aware framework based on the Mixture of Experts (MoE). Our framework employs a unified style encoder, trained on our large-scale dataset of content-style-stylized triplets, to embed diverse styles into a consistent latent space. This embedding is then used to condition a similarity-aware gating mechanism, which dynamically routes styles to specialized experts within the MoE architecture. Leveraging this MoE architecture, our method adeptly handles diverse styles spanning multiple semantic levels, from shallow textures to deep semantics. Extensive experiments show that StyleExpert outperforms existing approaches in preserving semantics and material details, while generalizing to unseen styles. Our code and collected images are available at the project page: https://hh-lg.github.io/StyleExpert-Page/.

LGAug 7, 2021Code
The Effect of Training Parameters and Mechanisms on Decentralized Federated Learning based on MNIST Dataset

Zhuofan Zhang, Mi Zhou, Kaicheng Niu et al.

Federated Learning is an algorithm suited for training models on decentralized data, but the requirement of a central "server" node is a bottleneck. In this document, we first introduce the notion of Decentralized Federated Learning (DFL). We then perform various experiments on different setups, such as changing model aggregation frequency, switching from independent and identically distributed (IID) dataset partitioning to non-IID partitioning with partial global sharing, using different optimization methods across clients, and breaking models into segments with partial sharing. All experiments are run on the MNIST handwritten digits dataset. We observe that those altered training procedures are generally robust, albeit non-optimal. We also observe failures in training when the variance between model weights is too large. The open-source experiment code is accessible through GitHub\footnote{Code was uploaded at \url{https://github.com/zhzhang2018/DecentralizedFL}}.

CVDec 8, 2023
Decoupling Degradation and Content Processing for Adverse Weather Image Restoration

Xi Wang, Xueyang Fu, Peng-Tao Jiang et al.

Adverse weather image restoration strives to recover clear images from those affected by various weather types, such as rain, haze, and snow. Each weather type calls for a tailored degradation removal approach due to its unique impact on images. Conversely, content reconstruction can employ a uniform approach, as the underlying image content remains consistent. Although previous techniques can handle multiple weather types within a single network, they neglect the crucial distinction between these two processes, limiting the quality of restored images. This work introduces a novel adverse weather image restoration method, called DDCNet, which decouples the degradation removal and content reconstruction process at the feature level based on their channel statistics. Specifically, we exploit the unique advantages of the Fourier transform in both these two processes: (1) the degradation information is mainly located in the amplitude component of the Fourier domain, and (2) the Fourier domain contains global information. The former facilitates channel-dependent degradation removal operation, allowing the network to tailor responses to various adverse weather types; the latter, by integrating Fourier's global properties into channel-independent content features, enhances network capacity for consistent global content reconstruction. We further augment the degradation removal process with a degradation mapping loss function. Extensive experiments demonstrate our method achieves state-of-the-art performance in multiple adverse weather removal benchmarks.

CVOct 21, 2025
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos

Jinfeng Liu, Lingtong Kong, Mi Zhou et al.

We introduce Mono4DGS-HDR, the first system for reconstructing renderable 4D high dynamic range (HDR) scenes from unposed monocular low dynamic range (LDR) videos captured with alternating exposures. To tackle such a challenging problem, we present a unified framework with two-stage optimization approach based on Gaussian Splatting. The first stage learns a video HDR Gaussian representation in orthographic camera coordinate space, eliminating the need for camera poses and enabling robust initial HDR video reconstruction. The second stage transforms video Gaussians into world space and jointly refines the world Gaussians with camera poses. Furthermore, we propose a temporal luminance regularization strategy to enhance the temporal consistency of the HDR appearance. Since our task has not been studied before, we construct a new evaluation benchmark using publicly available datasets for HDR video reconstruction. Extensive experiments demonstrate that Mono4DGS-HDR significantly outperforms alternative solutions adapted from state-of-the-art methods in both rendering quality and speed.

CVOct 17, 2024
Improving Consistency in Diffusion Models for Image Super-Resolution

Junhao Gu, Peng-Tao Jiang, Hao Zhang et al.

Recent methods exploit the powerful text-to-image (T2I) diffusion models for real-world image super-resolution (Real-ISR) and achieve impressive results compared to previous models. However, we observe two kinds of inconsistencies in diffusion-based methods which hinder existing models from fully exploiting diffusion priors. The first is the semantic inconsistency arising from diffusion guidance. T2I generation focuses on semantic-level consistency with text prompts, while Real-ISR emphasizes pixel-level reconstruction from low-quality (LQ) images, necessitating more detailed semantic guidance from LQ inputs. The second is the training-inference inconsistency stemming from the DDPM, which improperly assumes high-quality (HQ) latent corrupted by Gaussian noise as denoising inputs for each timestep. To address these issues, we introduce ConsisSR to handle both semantic and training-inference consistencies. On the one hand, to address the semantic inconsistency, we proposed a Hybrid Prompt Adapter (HPA). Instead of text prompts with coarse-grained classification information, we leverage the more powerful CLIP image embeddings to explore additional color and texture guidance. On the other hand, we introduce Time-Aware Latent Augmentation (TALA) to bridge the training-inference inconsistency. Based on the probability function p(t), we accordingly enhance the SDSR training strategy. With LQ latent with Gaussian noise as inputs, our TALA not only focuses on diffusion noise but also refine the LQ latent towards the HQ counterpart. Our method demonstrates state-of-the-art performance among existing diffusion models. The code will be made publicly available.

CVFeb 9, 2022
Motion-Aware Transformer For Occluded Person Re-identification

Mi Zhou, Hongye Liu, Zhekun Lv et al.

Recently, occluded person re-identification(Re-ID) remains a challenging task that people are frequently obscured by other people or obstacles, especially in a crowd massing situation. In this paper, we propose a self-supervised deep learning method to improve the location performance for human parts through occluded person Re-ID. Unlike previous works, we find that motion information derived from the photos of various human postures can help identify major human body components. Firstly, a motion-aware transformer encoder-decoder architecture is designed to obtain keypoints heatmaps and part-segmentation maps. Secondly, an affine transformation module is utilized to acquire motion information from the keypoint detection branch. Then the motion information will support the segmentation branch to achieve refined human part segmentation maps, and effectively divide the human body into reasonable groups. Finally, several cases demonstrate the efficiency of the proposed model in distinguishing different representative parts of the human body, which can avoid the background and occlusion disturbs. Our method consistently achieves state-of-the-art results on several popular datasets, including occluded, partial, and holistic.

LGDec 3, 2020
What Makes a Star Teacher? A Hierarchical BERT Model for Evaluating Teacher's Performance in Online Education

Wen Wang, Honglei Zhuang, Mi Zhou et al.

Education has a significant impact on both society and personal life. With the development of technology, online education has been growing rapidly over the past decade. While there are several online education studies on student behavior analysis, the course concept mining, and course recommendations (Feng, Tang, and Liu 2019; Pan et al. 2017), there is little research on evaluating teachers' performance in online education. In this paper, we conduct a systematic study to understand and effectively predict teachers' performance using the subtitles of 1,085 online courses. Our model-free analysis shows that teachers' verbal cues (e.g., question strategy, emotional appealing, and hedging) and their course structure design are both significantly correlated with teachers' performance evaluation. Based on these insights, we then propose a hierarchical course BERT model to predict teachers' performance in online education. Our proposed model can capture the hierarchical structure within each course as well as the deep semantic features extracted from the course content. Experiment results show that our proposed method achieves significant gain over several state-of-the-art methods. Our study provides a significant social impact in helping teachers improve their teaching style and enhance their instructional material design for more effective online teaching in the future.

IRAug 29, 2018
Centroid estimation based on symmetric KL divergence for Multinomial text classification problem

Jiangning Chen, Heinrich Matzinger, Haoyan Zhai et al.

We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the new method achieves substantial improvements over the traditional classifiers.

CLNov 8, 2016
A Surrogate-based Generic Classifier for Chinese TV Series Reviews

Yufeng Ma, Long Xia, Wenqi Shen et al.

With the emerging of various online video platforms like Youtube, Youku and LeTV, online TV series' reviews become more and more important both for viewers and producers. Customers rely heavily on these reviews before selecting TV series, while producers use them to improve the quality. As a result, automatically classifying reviews according to different requirements evolves as a popular research topic and is essential in our daily life. In this paper, we focused on reviews of hot TV series in China and successfully trained generic classifiers based on eight predefined categories. The experimental results showed promising performance and effectiveness of its generalization to different TV series.