h-index10
9papers
79citations
Novelty52%
AI Score48

9 Papers

HCJun 2
A Visual Analytics System for Interactive Exploration of Historical Painter Cohorts

Yingping Yang, Guangtao You, Wenwen Li et al.

Painter cohort analysis has long been regarded as a key lens for studying how painting artistic styles develop and transmit across generations. Through a two-year collaboration with art historians, we identify key challenges in traditional painter cohort research: the unstructured characteristic of painter features, the entangled complexity of inheritance relationships, and the cognitively demanding nature of cohort definition and validation. To solve these challenges, we propose HPC-Vis, a visual analytics system for interactive exploration of historical painter cohorts. An improved cohort analytical workflow is designed to integrate structured feature construction, visualization-assisted exploration, algorithm-based recommendation, and unified cohort management. Based on this workflow, we develop three core computational modules: a multi-scale artistic feature construction method that leverages LLMs to extract and organize hierarchical style features from unstructured historical texts, an inheritance reconstruction algorithm that transforms the entangled multi-parent inheritance network into a clear hierarchical forest structure, and a recommendation model that identifies core features of the cohort and recommends cohort members via painter relevance assessment. To support smooth interactive exploration, we further design a set of novel visualizations with multidimensional collaboration, especially an inheriting mountain view inspired by traditional Chinese landscape paintings, and a foldable doughnut chart for hierarchical artistic style labels. HPC-Vis is evaluated and validated through case studies, user studies, and technical evaluations, demonstrating its effectiveness in supporting painter cohort exploration and in providing visual insights for art historical research.

CVOct 27, 2022
MMFL-Net: Multi-scale and Multi-granularity Feature Learning for Cross-domain Fashion Retrieval

Chen Bao, Xudong Zhang, Jiazhou Chen et al.

Instance-level image retrieval in fashion is a challenging issue owing to its increasing importance in real-scenario visual fashion search. Cross-domain fashion retrieval aims to match the unconstrained customer images as queries for photographs provided by retailers; however, it is a difficult task due to a wide range of consumer-to-shop (C2S) domain discrepancies and also considering that clothing image is vulnerable to various non-rigid deformations. To this end, we propose a novel multi-scale and multi-granularity feature learning network (MMFL-Net), which can jointly learn global-local aggregation feature representations of clothing images in a unified framework, aiming to train a cross-domain model for C2S fashion visual similarity. First, a new semantic-spatial feature fusion part is designed to bridge the semantic-spatial gap by applying top-down and bottom-up bidirectional multi-scale feature fusion. Next, a multi-branch deep network architecture is introduced to capture global salient, part-informed, and local detailed information, and extracting robust and discrimination feature embedding by integrating the similarity learning of coarse-to-fine embedding with the multiple granularities. Finally, the improved trihard loss, center loss, and multi-task classification loss are adopted for our MMFL-Net, which can jointly optimize intra-class and inter-class distance and thus explicitly improve intra-class compactness and inter-class discriminability between its visual representations for feature learning. Furthermore, our proposed model also combines the multi-task attribute recognition and classification module with multi-label semantic attributes and product ID labels. Experimental results demonstrate that our proposed MMFL-Net achieves significant improvement over the state-of-the-art methods on the two datasets, DeepFashion-C2S and Street2Shop.

CVFeb 9
TIBR4D: Tracing-Guided Iterative Boundary Refinement for Efficient 4D Gaussian Segmentation

He Wu, Xia Yan, Yanghui Xu et al.

Object-level segmentation in dynamic 4D Gaussian scenes remains challenging due to complex motion, occlusions, and ambiguous boundaries. In this paper, we present an efficient learning-free 4D Gaussian segmentation framework that lifts video segmentation masks to 4D spaces, whose core is a two-stage iterative boundary refinement, TIBR4D. The first stage is an Iterative Gaussian Instance Tracing (IGIT) at the temporal segment level. It progressively refines Gaussian-to-instance probabilities through iterative tracing, and extracts corresponding Gaussian point clouds that better handle occlusions and preserve completeness of object structures compared to existing one-shot threshold-based methods. The second stage is a frame-wise Gaussian Rendering Range Control (RCC) via suppressing highly uncertain Gaussians near object boundaries while retaining their core contributions for more accurate boundaries. Furthermore, a temporal segmentation merging strategy is proposed for IGIT to balance identity consistency and dynamic awareness. Longer segments enforce stronger multi-frame constraints for stable identities, while shorter segments allow identity changes to be captured promptly. Experiments on HyperNeRF and Neu3D demonstrate that our method produces accurate object Gaussian point clouds with clearer boundaries and higher efficiency compared to SOTA methods.

GRSep 30, 2025
Vector sketch animation generation with differentialable motion trajectories

Xinding Zhu, Xinye Yang, Shuyang Zheng et al.

Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach.

CVJun 3, 2025
InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba

Zizhao Wu, Yingying Sun, Yiming Chen et al.

Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these issues, we propose a novel, efficient human-human interaction generation method based on the Mamba framework, designed to meet the demands of effectively capturing long-sequence dependencies while providing real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual sequences, we develop two key modules: the self-adaptive spatio-temporal Mamba module and the cross-adaptive spatio-temporal Mamba module, enabling efficient feature learning. Extensive experiments demonstrate that our method achieves state-of-the-art results on two interaction datasets with remarkable quality and efficiency. Compared to the baseline method InterGen, our approach not only improves accuracy but also requires a minimal parameter size of just 66M ,only 36% of InterGen's, while achieving an average inference speed of 0.57 seconds, which is 46% of InterGen's execution time.

CVOct 28, 2024
Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints

ZhenXing Dong, JiaZhou Chen, YangHui Xu

The planning of digital orthodontic treatment requires providing tooth alignment, which not only consumes a lot of time and labor to determine manually but also relays clinical experiences heavily. In this work, we proposed a lightweight tooth alignment neural network based on Swin-transformer. We first re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously. We then designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws. They are important clinical constraints, first introduced to the best of our knowledge, and lead to cutting-edge prediction accuracy. To train our network, we collected a large digital orthodontic dataset that has 591 clinical cases, including various complex clinical cases. This dataset will benefit the community after its release since there is no open dataset so far. Furthermore, we also proposed two new orthodontic dataset augmentation methods considering tooth spatial distribution and occlusion. We evaluated our method with this dataset and extensive experiments, including comparisons with STAT methods and ablation studies, and demonstrate the high prediction accuracy of our method.

CVDec 18, 2021
3D Instance Segmentation of MVS Buildings

Jiazhou Chen, Yanghui Xu, Shufang Lu et al.

We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.

IVJul 1, 2020
Learning Common Harmonic Waves on Stiefel Manifold -- A New Mathematical Approach for Brain Network Analyses

Jiazhou Chen, Guoqiang Han, Hongmin Cai et al.

Converging evidence shows that disease-relevant brain alterations do not appear in random brain locations, instead, its spatial pattern follows large scale brain networks. In this context, a powerful network analysis approach with a mathematical foundation is indispensable to understand the mechanism of neuropathological events spreading throughout the brain. Indeed, the topology of each brain network is governed by its native harmonic waves, which are a set of orthogonal bases derived from the Eigen-system of the underlying Laplacian matrix. To that end, we propose a novel connectome harmonic analysis framework to provide enhanced mathematical insights by detecting frequency-based alterations relevant to brain disorders. The backbone of our framework is a novel manifold algebra appropriate for inference across harmonic waves that overcomes the limitations of using classic Euclidean operations on irregular data structures. The individual harmonic difference is measured by a set of common harmonic waves learned from a population of individual Eigen systems, where each native Eigen-system is regarded as a sample drawn from the Stiefel manifold. Specifically, a manifold optimization scheme is tailored to find the common harmonic waves which reside at the center of Stiefel manifold. To that end, the common harmonic waves constitute the new neuro-biological bases to understand disease progression. Each harmonic wave exhibits a unique propagation pattern of neuro-pathological burdens spreading across brain networks. The statistical power of our novel connectome harmonic analysis approach is evaluated by identifying frequency-based alterations relevant to Alzheimer's disease, where our learning-based manifold approach discovers more significant and reproducible network dysfunction patterns compared to Euclidian methods.

LGMay 10, 2019
Integrating Tensor Similarity to Enhance Clustering Performance

Hong Peng, Yu Hu, Jiazhou Chen et al.

The performance of most the clustering methods hinges on the used pairwise affinity, which is usually denoted by a similarity matrix. However, the pairwise similarity is notoriously known for its vulnerability of noise contamination or the imbalance in samples or features, and thus hinders accurate clustering. To tackle this issue, we propose to use information among samples to boost the clustering performance. We proved that a simplified similarity for pairs, denoted by a fourth order tensor, equals to the Kronecker product of pairwise similarity matrices under decomposable assumption, or provide complementary information for which the pairwise similarity missed under indecomposable assumption. Then a high order similarity matrix is obtained from the tensor similarity via eigenvalue decomposition. The high order similarity capturing spatial information serves as a robust complement for the pairwise similarity. It is further integrated with the popular pairwise similarity, named by IPS2, to boost the clustering performance. Extensive experiments demonstrated that the proposed IPS2 significantly outperformed previous similarity-based methods on real-world datasets and it was capable of handling the clustering task over under-sampled and noisy datasets.