Heterogeneous domain adaptation: An unsupervised approach
It addresses a gap in domain adaptation for scenarios like cancer detection and credit assessment where target data is unlabeled, but the approach appears incremental as it builds on existing heterogeneous methods.
This paper tackles the problem of unsupervised heterogeneous domain adaptation, where the target domain has no labeled data, by proposing a new model called GLG that uses a theorem and metric to transfer knowledge between domains. The model achieved superior performance over existing baselines on ten tasks across three applications, though specific numerical gains are not detailed.
Domain adaptation leverages the knowledge in one domain - the source domain - to improve learning efficiency in another domain - the target domain. Existing heterogeneous domain adaptation research is relatively well-progressed, but only in situations where the target domain contains at least a few labeled instances. In contrast, heterogeneous domain adaptation with an unlabeled target domain has not been well-studied. To contribute to the research in this emerging field, this paper presents: (1) an unsupervised knowledge transfer theorem that guarantees the correctness of transferring knowledge; and (2) a principal angle-based metric to measure the distance between two pairs of domains: one pair comprises the original source and target domains and the other pair comprises two homogeneous representations of two domains. The theorem and the metric have been implemented in an innovative transfer model, called a Grassmann-Linear monotonic maps-geodesic flow kernel (GLG), that is specifically designed for heterogeneous unsupervised domain adaptation (HeUDA). The linear monotonic maps meet the conditions of the theorem and are used to construct homogeneous representations of the heterogeneous domains. The metric shows the extent to which the homogeneous representations have preserved the information in the original source and target domains. By minimizing the proposed metric, the GLG model learns the homogeneous representations of heterogeneous domains and transfers knowledge through these learned representations via a geodesic flow kernel. To evaluate the model, five public datasets were reorganized into ten HeUDA tasks across three applications: cancer detection, credit assessment, and text classification. The experiments demonstrate that the proposed model delivers superior performance over the existing baselines.