HCJan 25, 2022

SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling

arXiv:2201.11125v13 citations
Originality Incremental advance
AI Analysis

This addresses the problem for social scientists who need to identify and evaluate harmonized survey data for comparative studies, though it is incremental as it builds on existing data harmonization efforts.

The paper tackles the challenge of efficiently exploring and querying harmonized cross-national survey data, which often lacks guidance in existing portals, by proposing SQRQuerier, a visual querying framework that uses a BERT-based model for customized queries, visualizes data availability, and discloses relational patterns, demonstrated through case studies with social scientists.

Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus helping users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes