Towards Large-Scale Exploratory Search over Heterogeneous Sources
This addresses the challenge of automatically structuring heterogeneous data for researchers and users, though it appears incremental as it builds on existing topic modeling approaches.
The paper tackles the problem of knowledge discovery from diverse web-scale text sources by proposing an algorithm to aggregate multiple collections into a single hierarchical topic model, resulting in a web service called Rysearch for exploratory search.
Since time immemorial, people have been looking for ways to organize scientific knowledge into some systems to facilitate search and discovery of new ideas. The problem was partially solved in the pre-Internet era using library classifications, but nowadays it is nearly impossible to classify all scientific and popular scientific knowledge manually. There is a clear gap between the diversity and the amount of data available on the Internet and the algorithms for automatic structuring of such data. In our preliminary study, we approach the problem of knowledge discovery on web-scale data with diverse text sources and propose an algorithm to aggregate multiple collections into a single hierarchical topic model. We implement a web service named Rysearch to demonstrate the concept of topical exploratory search and make it available online.