Wen-Bo Xie

h-index6

4papers

71citations

Novelty43%

AI Score36

Ranked #99,758 of 194,257 authors (top 51%)#1,027 in IR (top 47%)

4 Papers

4.1LGSep 10, 2025

Data Skeleton Learning: Scalable Active Clustering with Sparse Graph Structures

Wen-Bo Xie, Xun Fu, Bin Chen et al.

In this work, we focus on the efficiency and scalability of pairwise constraint-based active clustering, crucial for processing large-scale data in applications such as data mining, knowledge annotation, and AI model pre-training. Our goals are threefold: (1) to reduce computational costs for iterative clustering updates; (2) to enhance the impact of user-provided constraints to minimize annotation requirements for precise clustering; and (3) to cut down memory usage in practical deployments. To achieve these aims, we propose a graph-based active clustering algorithm that utilizes two sparse graphs: one for representing relationships between data (our proposed data skeleton) and another for updating this data skeleton. These two graphs work in concert, enabling the refinement of connected subgraphs within the data skeleton to create nested clusters. Our empirical analysis confirms that the proposed algorithm consistently facilitates more accurate clustering with dramatically less input of user-provided constraints, and outperforms its counterparts in terms of computational performance and scalability, while maintaining robustness across various distance metrics.

3.6MLNov 11, 2021Code

Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees

Wen-Bo Xie, Zhen Liu, Jaideep Srivastava

One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree for further aggregation. However, conventional hierarchical clustering approaches have adopted some simple tricks to select the "representative" points which might not be as representative as enough. Thus, the constructed cluster tree is less attractive in terms of its poor robustness and weak reliability. Aiming at this issue, we propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum-spanning-tree. Extensive experiments on UCI datasets show that the proposed algorithm is more accurate than other benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data with less time and storage consumptions.

5.5IRJul 9, 2019

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Wen-Bo Xie, Yan-Li Lee, Cong Wang et al.

Clustering is a fundamental analysis tool aiming at classifying data points into groups based on their similarity or distance. It has found successful applications in all natural and social sciences, including biology, physics, economics, chemistry, astronomy, psychology, and so on. Among numerous existent algorithms, hierarchical clustering algorithms are of a particular advantage as they can provide results under different resolutions without any predetermined number of clusters and unfold the organization of resulted clusters. At the same time, they suffer a variety of drawbacks and thus are either time-consuming or inaccurate. We propose a novel hierarchical clustering approach on the basis of a simple hypothesis that two reciprocal nearest data points should be grouped in one cluster. Extensive tests on data sets across multiple domains show that our method is much faster and more accurate than the state-of-the-art benchmarks. We further extend our method to deal with the community detection problem in real networks, achieving remarkably better results in comparison with the well-known Girvan-Newman algorithm.

2.2IRMar 7, 2017

Heterogeneous information network model for equipment-standard system

Liang Yin, Li-Chen Shi, Jun-Yan Zhao et al.

Entity information network is used to describe structural relationships between entities. Taking advantage of its extension and heterogeneity, entity information network is more and more widely applied to relationship modeling. Recent years, lots of researches about entity information network modeling have been proposed, while seldom of them concentrate on equipment-standard system with properties of multi-layer, multi-dimension and multi-scale. In order to efficiently deal with some complex issues in equipment-standard system such as standard revising, standard controlling, and production designing, a heterogeneous information network model for equipment-standard system is proposed in this paper. Three types of entities and six types of relationships are considered in the proposed model. Correspondingly, several different similarity-measuring methods are used in the modeling process. The experiments show that the heterogeneous information network model established in this paper can reflect relationships between entities accurately. Meanwhile, the modeling process has a good performance on time consumption.