LGApr 28, 2022Code
Hyperbolic Hierarchical Knowledge Graph Embeddings for Link Prediction in Low DimensionsWenjie Zheng, Wenxue Wang, Shu Zhao et al.
Knowledge graph embeddings (KGE) have been validated as powerful methods for inferring missing links in knowledge graphs (KGs) that they typically map entities into Euclidean space and treat relations as transformations of entities. Recently, some Euclidean KGE methods have been enhanced to model semantic hierarchies commonly found in KGs, improving the performance of link prediction. To embed hierarchical data, hyperbolic space has emerged as a promising alternative to traditional Euclidean space, offering high fidelity and lower memory consumption. Unlike Euclidean, hyperbolic space provides countless curvatures to choose from. However, it is difficult for existing hyperbolic KGE methods to obtain the optimal curvature settings manually, thereby limiting their ability to effectively model semantic hierarchies. To address this limitation, we propose a novel KGE model called $\textbf{Hyp}$erbolic $\textbf{H}$ierarchical $\textbf{KGE}$ (HypHKGE). This model introduces attention-based learnable curvatures for hyperbolic space, which helps preserve rich semantic hierarchies. Furthermore, to utilize the preserved hierarchies for inferring missing links, we define hyperbolic hierarchical transformations based on the theory of hyperbolic geometry, including both inter-level and intra-level modeling. Experiments demonstrate the effectiveness of the proposed HypHKGE model on the three benchmark datasets (WN18RR, FB15K-237, and YAGO3-10). The source code will be publicly released at https://github.com/wjzheng96/HypHKGE.
CVMay 24
D3S2: Diffusion-Guided Dataset Distillation for Semantic SegmentationWenjie Zheng, Haoji Hu, Jiali Lu et al.
Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic sets while preserving training efficacy. However, existing studies mainly focus on image classification, leaving dense prediction tasks such as semantic segmentation largely underexplored. In this work, we identify three key challenges for segmentation DD: (i) long-tailed class imbalance, (ii) the need for strict pixel-wise alignment between images and dense labels, and (iii) the high computational cost of optimizing high-resolution data with complex models. To address these challenges, we propose D3S2, a Diffusion-guided Dataset Distillation framework for Semantic Segmentation. Our method adopts a two-stage design. In Class-Balanced Mask Selection, we construct a representative mask set via a greedy strategy that prioritizes underrepresented classes. In Diffusion-Guided Image Synthesis, we employ a pretrained layout-to-image diffusion model to generate images conditioned on the selected masks, naturally ensuring spatial alignment. To further enhance the training utility of synthesized data, we introduce guided diffusion sampling with two complementary objectives: a segmentation-consistency loss for pixel-level alignment, and a class-wise feature matching loss for aligning per-class feature statistics across layers. Extensive experiments demonstrate the superiority of D3S2. Notably, at an extremely compression rate of 1%, our method achieves 24.99% and 35.49% mIoU on ADE20K and COCO-Stuff with Mask2Former (Swin-S), outperforming random selection by 9.34% and 5.70%, respectively.
CVAug 24, 2025
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image DetectionLianrui Mu, Zou Xingze, Jianhong Bai et al.
The rapid growth of high-resolution, meticulously crafted AI-generated images poses a significant challenge to existing detection methods, which are often trained and evaluated on low-resolution, automatically generated datasets that do not align with the complexities of high-resolution scenarios. A common practice is to resize or center-crop high-resolution images to fit standard network inputs. However, without full coverage of all pixels, such strategies risk either obscuring subtle, high-frequency artifacts or discarding information from uncovered regions, leading to input information loss. In this paper, we introduce the High-Resolution Detail-Aggregation Network (HiDA-Net), a novel framework that ensures no pixel is left behind. We use the Feature Aggregation Module (FAM), which fuses features from multiple full-resolution local tiles with a down-sampled global view of the image. These local features are aggregated and fused with global representations for final prediction, ensuring that native-resolution details are preserved and utilized for detection. To enhance robustness against challenges such as localized AI manipulations and compression, we introduce Token-wise Forgery Localization (TFL) module for fine-grained spatial sensitivity and JPEG Quality Factor Estimation (QFE) module to disentangle generative artifacts from compression noise explicitly. Furthermore, to facilitate future research, we introduce HiRes-50K, a new challenging benchmark consisting of 50,568 images with up to 64 megapixels. Extensive experiments show that HiDA-Net achieves state-of-the-art, increasing accuracy by over 13% on the challenging Chameleon dataset and 10% on our HiRes-50K.
CLJan 27, 2025
Towards Explainable Multimodal Depression Recognition for Clinical InterviewsWenjie Zheng, Qiming Xie, Zengzhi Wang et al.
Recently, multimodal depression recognition for clinical interviews (MDRC) has recently attracted considerable attention. Existing MDRC studies mainly focus on improving task performance and have achieved significant development. However, for clinical applications, model transparency is critical, and previous works ignore the interpretability of decision-making processes. To address this issue, we propose an Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task, which aims to provide evidence for depression recognition by summarizing symptoms and uncovering underlying causes. Given an interviewer-participant interaction scenario, the goal of EMDRC is to structured summarize participant's symptoms based on the eight-item Patient Health Questionnaire depression scale (PHQ-8), and predict their depression severity. To tackle the EMDRC task, we construct a new dataset based on an existing MDRC dataset. Moreover, we utilize the PHQ-8 and propose a PHQ-aware multimodal multi-task learning framework, which captures the utterance-level symptom-related semantic information to help generate dialogue-level summary. Experiment results on our annotated dataset demonstrate the superiority of our proposed methods over baseline systems on the EMDRC task.
CVNov 17, 2025
Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene RelightingJiangnan Ye, Jiedong Zhuang, Lianrui Mu et al.
We introduce GS-Light, an efficient, textual position-aware pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting (3DGS). GS-Light implements a training-free extension of a single-input diffusion model to handle multi-view inputs. Given a user prompt that may specify lighting direction, color, intensity, or reference objects, we employ a large vision-language model (LVLM) to parse the prompt into lighting priors. Using off-the-shelf estimators for geometry and semantics (depth, surface normals, and semantic segmentation), we fuse these lighting priors with view-geometry constraints to compute illumination maps and generate initial latent codes for each view. These meticulously derived init latents guide the diffusion model to generate relighting outputs that more accurately reflect user expectations, especially in terms of lighting direction. By feeding multi-view rendered images, along with the init latents, into our multi-view relighting model, we produce high-fidelity, artistically relit images. Finally, we fine-tune the 3DGS scene with the relit appearance to obtain a fully relit 3D scene. We evaluate GS-Light on both indoor and outdoor scenes, comparing it to state-of-the-art baselines including per-view relighting, video relighting, and scene editing methods. Using quantitative metrics (multi-view consistency, imaging quality, aesthetic score, semantic similarity, etc.) and qualitative assessment (user studies), GS-Light demonstrates consistent improvements over baselines. Code and assets will be made available upon publication.
CVDec 12, 2024
Identity-Preserving Pose-Guided Character Animation via Facial Landmarks TransformationLianrui Mu, Xingze Zhou, Wenjie Zheng et al.
Creating realistic pose-guided image-to-video character animations while preserving facial identity remains challenging, especially in complex and dynamic scenarios such as dancing, where precise identity consistency is crucial. Existing methods frequently encounter difficulties maintaining facial coherence due to misalignments between facial landmarks extracted from driving videos that provide head pose and expression cues and the facial geometry of the reference images. To address this limitation, we introduce the Facial Landmarks Transformation (FLT) method, which leverages a 3D Morphable Model to address this limitation. FLT converts 2D landmarks into a 3D face model, adjusts the 3D face model to align with the reference identity, and then transforms them back into 2D landmarks to guide the image-to-video generation process. This approach ensures accurate alignment with the reference facial geometry, enhancing the consistency between generated videos and reference images. Experimental results demonstrate that FLT effectively preserves facial identity, significantly improving pose-guided character animation models.
MLApr 1, 2020
Total Variation Regularization for Compartmental Epidemic Models with Time-Varying DynamicsWenjie Zheng
Compartmental epidemic models are among the most popular ones in epidemiology. For the parameters (e.g., the transmission rate) characterizing these models, the majority of researchers simplify them as constants, while some others manage to detect their continuous variations. In this paper, we aim at capturing, on the other hand, discontinuous variations, which better describe the impact of many noteworthy events, such as city lockdowns, the opening of field hospitals, and the mutation of the virus, whose effect should be instant. To achieve this, we balance the model's likelihood by total variation, which regulates the temporal variations of the model parameters. To infer these parameters, instead of using Monte Carlo methods, we design a novel yet straightforward optimization algorithm, dubbed Iterated Nelder--Mead, which repeatedly applies the Nelder--Mead algorithm. Experiments conducted on the simulated data demonstrate that our approach can reproduce these discontinuities and precisely depict the epidemics.
MLOct 20, 2019
$hv$-Block Cross Validation is not a BIBD: a Note on the Paper by Jeff Racine (2000)Wenjie Zheng
This note corrects a mistake in the paper "consistent cross-validatory model-selection for dependent data: $hv$-block cross-validation" by Racine (2000). In his paper, he implied that the therein proposed $hv$-block cross-validation is consistent in the sense of Shao (1993). To get this intuition, he relied on the speculation that $hv$-block is a balanced incomplete block design (BIBD). This note demonstrates that this is not the case, and thus the theoretical consistency of $hv$-block remains an open question. In addition, I also provide a Python program counting the number of occurrences of each sample and each pair of samples.
DCDec 20, 2017
A Distributed Frank-Wolfe Framework for Learning Low-Rank Matrices with the Trace NormWenjie Zheng, Aurélien Bellet, Patrick Gallinari
We consider the problem of learning a high-dimensional but low-rank matrix from a large-scale dataset distributed over several machines, where low-rankness is enforced by a convex trace norm constraint. We propose DFW-Trace, a distributed Frank-Wolfe algorithm which leverages the low-rank structure of its updates to achieve efficiency in time, memory and communication usage. The step at the heart of DFW-Trace is solved approximately using a distributed version of the power method. We provide a theoretical analysis of the convergence of DFW-Trace, showing that we can ensure sublinear convergence in expectation to an optimal solution with few power iterations per epoch. We implement DFW-Trace in the Apache Spark distributed programming framework and validate the usefulness of our approach on synthetic and real data, including the ImageNet dataset with high-dimensional features extracted from a deep neural network.
IRApr 28, 2016
Matrix Factorization Method for Decentralized Recommender SystemsWenjie Zheng
Decentralized recommender system does not rely on the central service provider, and the users can keep the ownership of their ratings. This article brings the theoretically well-studied matrix factorization method into the decentralized recommender system, where the formerly prevalent algorithms are heuristic and hence lack of theoretical guarantee. Our preliminary simulation results show that this method is promising.
MLApr 28, 2016
Two Differentially Private Rating Collection Mechanisms for Recommender SystemsWenjie Zheng
We design two mechanisms for the recommender system to collect user ratings. One is modified Laplace mechanism, and the other is randomized response mechanism. We prove that they are both differentially private and preserve the data utility.
MLOct 12, 2015
Toward a Better Understanding of LeaderboardWenjie Zheng
The leaderboard in machine learning competitions is a tool to show the performance of various participants and to compare them. However, the leaderboard quickly becomes no longer accurate, due to hack or overfitting. This article gives two pieces of advice to prevent easy hack or overfitting. By following these advice, we reach the conclusion that something like the Ladder leaderboard introduced in [blum2015ladder] is inevitable. With this understanding, we naturally simplify Ladder by eliminating its redundant computation and explain how to choose the parameter and interpret it. We also prove that the sample complexity is cubic to the desired precision of the leaderboard.