Yamei Tu

2papers

2 Papers

IRJun 7, 2023
SKG: A Versatile Information Retrieval and Analysis Framework for Academic Papers with Semantic Knowledge Graphs

Yamei Tu, Rui Qiu, Han-Wei Shen

The number of published research papers has experienced exponential growth in recent years, which makes it crucial to develop new methods for efficient and versatile information extraction and knowledge discovery. To address this need, we propose a Semantic Knowledge Graph (SKG) that integrates semantic concepts from abstracts and other meta-information to represent the corpus. The SKG can support various semantic queries in academic literature thanks to the high diversity and rich information content stored within. To extract knowledge from unstructured text, we develop a Knowledge Extraction Module that includes a semi-supervised pipeline for entity extraction and entity normalization. We also create an ontology to integrate the concepts with other meta information, enabling us to build the SKG. Furthermore, we design and develop a dataflow system that demonstrates how to conduct various semantic queries flexibly and interactively over the SKG. To demonstrate the effectiveness of our approach, we conduct the research based on the visualization literature and provide real-world use cases to show the usefulness of the SKG. The dataset and codes for this work are available at https://osf.io/aqv8p/?view_only=2c26b36e3e3941ce999df47e4616207f.

HCJan 25, 2022
SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling

Yamei Tu, Olga Li, Junpeng Wang et al.

Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus helping users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier.