LG AI MLOct 2, 2022

Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies

Xue Liu, Dan Sun, Xiaobo Cao, Hao Ye, Wei Wei

arXiv:2210.00415v11.81 citationsh-index: 37

Originality Highly original

AI Analysis

This addresses the problem of graph classification for researchers and practitioners by providing a novel embedding approach that improves performance over existing methods.

The authors tackled the problem of graph classification by proposing MetricDistribution2vec, a novel embedding strategy that captures broad-scale metric distribution characteristics to create vector representations. The method achieved unexpected performance increases across all tested real-world graph datasets, even with lightweight classifiers, and showed attractive discrimination in few-shot learning scenarios.

Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by mapping each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classification for each targeted data is a qualitative issue based on understanding the overall discrepancies within the dataset scale. From the statistical point of view, these discrepancies manifest a metric distribution over the dataset scale if the distance metric is adopted to measure the pairwise similarity or dissimilarity. Therefore, we present a novel embedding strategy named $\mathbf{MetricDistribution2vec}$ to extract such distribution characteristics into the vectorial representation for each data. We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets. The results have gained some unexpected increases compared with a surge of baselines on all the datasets, even if we take the lightweight models as classifiers. Moreover, the proposed methods also conducted experiments in Few-Shot classification scenarios, and the results still show attractive discrimination in rare training samples based inference.

View on arXiv PDF

Similar