Jie Liu

45.4CVMay 14, 2024Code

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Zhimin Li, Jianwei Zhang, Qin Lin et al.

We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT

2.6LGMar 14, 2024

Detecting Anomalies in Dynamic Graphs via Memory enhanced Normality

Jie Liu, Xuequn Shang, Xiaolin Han et al.

Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes. The conventional approaches that tackle this problem typically employ an unsupervised learning framework, capturing normality patterns with exclusive normal data during training and identifying deviations as anomalies during testing. However, these methods face critical drawbacks: they either only depend on proxy tasks for representation without directly pinpointing normal patterns, or they neglect to differentiate between spatial and temporal normality patterns. More recent methods that use contrastive learning with negative sampling also face high computational costs, limiting their scalability to large graphs. To address these challenges, we introduce a novel Spatial-Temporal memories-enhanced graph autoencoder (STRIPE). Initially, STRIPE employs Graph Neural Networks (GNNs) and gated temporal convolution layers to extract spatial and temporal features. Then STRIPE incorporates separate spatial and temporal memory networks to capture and store prototypes of normal patterns, respectively. These stored patterns are retrieved and integrated with encoded graph embeddings through a mutual attention mechanism. Finally, the integrated features are fed into the decoder to reconstruct the graph streams which serve as the proxy task for anomaly detection. This comprehensive approach not only minimizes reconstruction errors but also emphasizes the compactness and distinctiveness of the embeddings w.r.t. the nearest memory prototypes. Extensive experiments on six benchmark datasets demonstrate the effectiveness and efficiency of STRIPE, where STRIPE significantly outperforms existing methods with 5.8% improvement in AUC scores and 4.62X faster in training time.

Jie Liu

2 Papers