Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks
This work addresses the need for a general approach to relational learning on heterogeneous networks, which is incremental as it builds on specialized methods by offering a unified framework.
The authors tackled the problem of performing relational learning and feature extraction on heterogeneous information networks by proposing a unified framework with a declarative query language, enabling queries to serve as relational features for machine learning models. They demonstrated the system's capabilities in natural language processing and computational biology domains.
Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized approaches to perform specific types of analysis, mining and learning on such networks. In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ways that facilitate relational and structured machine learning. In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models. Feature extraction is performed by making declarative graph traversal queries. Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to support new predictions. We demonstrate this system's capabilities by showcasing tasks in natural language processing and computational biology domains.