CLFeb 13, 2023
Evaluation of Word Embeddings for the Social SciencesRicardo Schiffers, Dagmar Kern, Daniel Hienert
Word embeddings are an essential instrument in many NLP tasks. Most available resources are trained on general language from Web corpora or Wikipedia dumps. However, word embeddings for domain-specific language are rare, in particular for the social science domain. Therefore, in this work, we describe the creation and evaluation of word embedding models based on 37,604 open-access social science research papers. In the evaluation, we compare domain-specific and general language models for (i) language coverage, (ii) diversity, and (iii) semantic relationships. We found that the created domain-specific model, even with a relatively small vocabulary size, covers a large part of social science concepts, their neighborhoods are diverse in comparison to more general models. Across all relation types, we found a more extensive coverage of semantic relationships.
39.0IRMay 11
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated RetrievalMehmet Deniz Türkmen, Suchana Datta, Dwaipayan Roy et al.
Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.
IRJan 7, 2022
SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web SearchChristian Otto, Markus Rokicki, Georg Pardi et al.
The emerging research field Search as Learning investigates how the Web facilitates learning through modern information retrieval systems. SAL research requires significant amounts of data that capture both search behavior of users and their acquired knowledge in order to obtain conclusive insights or train supervised machine learning models. However, the creation of such datasets is costly and requires interdisciplinary efforts in order to design studies and capture a wide range of features. In this paper, we address this issue and introduce an extensive dataset based on a user study, in which $114$ participants were asked to learn about the formation of lightning and thunder. Participants' knowledge states were measured before and after Web search through multiple-choice questionnaires and essay-based free recall tasks. To enable future research in SAL-related tasks we recorded a plethora of features and person-related attributes. Besides the screen recordings, visited Web pages, and detailed browsing histories, a large number of behavioral features and resource features were monitored. We underline the usefulness of the dataset by describing three, already published, use cases.
HCAug 5, 2020
'A Modern Up-To-Date Laptop' -- Vagueness in Natural Language Queries for Product SearchAndrea Papenmeier, Alfred Sliwa, Dagmar Kern et al.
With the rise of voice assistants and an increase in mobile search usage, natural language has become an important query language. So far, most of the current systems are not able to process these queries because of the vagueness and ambiguity in natural language. Users have adapted their query formulation to what they think the search engine is capable of, which adds to their cognitive burden. With our research, we contribute to the design of interactive search systems by investigating the genuine information need in a product search scenario. In a crowd-sourcing experiment, we collected 132 information needs in natural language. We examine the vagueness of the formulations and their match to retailer-generated content and user-generated product reviews. Our findings reveal high variance on the level of vagueness and the potential of user reviews as a source for supporting users with rather vague search intents.
IRAug 5, 2020
The Role of Word-Eye-Fixations for Query Term PredictionMasoud Davari, Daniel Hienert, Dagmar Kern et al.
Throughout the search process, the user's gaze on inspected SERPs and websites can reveal his or her search interests. Gaze behavior can be captured with eye tracking and described with word-eye-fixations. Word-eye-fixations contain the user's accumulated gaze fixation duration on each individual word of a web page. In this work, we analyze the role of word-eye-fixations for predicting query terms. We investigate the relationship between a range of in-session features, in particular, gaze data, with the query terms and train models for predicting query terms. We use a dataset of 50 search sessions obtained through a lab study in the social sciences domain. Using established machine learning models, we can predict query terms with comparably high accuracy, even with only little training data. Feature analysis shows that the categories Fixation, Query Relevance and Session Topic contain the most effective features for our task.
DLSep 24, 2019
Recognizing Topic Change in Search Sessions of Digital Libraries based on Thesaurus and Classification SystemDaniel Hienert, Dagmar Kern
Log analysis in Web search showed that user sessions often contain several different topics. This means sessions need to be segmented into parts which handle the same topic in order to give appropriate user support based on the topic, and not on a mixture of topics. Different methods have been proposed to segment a user session to different topics based on timeouts, lexical analysis, query similarity or external knowledge sources. In this paper, we study the problem in a digital library for the social sciences. We present a method based on a thesaurus and a classification system which are typical knowledge organization systems in digital libraries. Five experts evaluated our approach and rated it as good for the segmentation of search sessions into parts that treat the same topic.
IRSep 19, 2019
Understanding the Information needs of Social Scientists in GermanyDagmar Kern, Daniel Hienert
The information needs of social science researchers are manifold and almost studied in every decade since the 1950s. With this paper, we contribute to this series and present the results of three studies. We asked 367 social science researchers in Germany for their information needs and identified needs in different categories: literature, research data, measurement instruments, support for data analysis, support for data collection, variables in research data, software support, networking/cooperation, and illustrative material. Thereby, the search for literature and research data is still the main information need with more than three-quarter of our participants expressing needs in these categories. With comprehensive lists of altogether 154 concrete information needs, even those that are only expressed by one participant, we contribute to the holistic understanding of the information needs of social science researchers of today.
IRFeb 12, 2019
Reading Protocol: Understanding what has been Read in Interactive Information Retrieval TasksDaniel Hienert, Dagmar Kern, Matthew Mitsui et al.
In Interactive Information Retrieval (IIR) experiments the user's gaze motion on web pages is often recorded with eye tracking. The data is used to analyze gaze behavior or to identify Areas of Interest (AOI) the user has looked at. So far, tools for analyzing eye tracking data have certain limitations in supporting the analysis of gaze behavior in IIR experiments. Experiments often consist of a huge number of different visited web pages. In existing analysis tools the data can only be analyzed in videos or images and AOIs for every single web page have to be specified by hand, in a very time consuming process. In this work, we propose the reading protocol software which breaks eye tracking data down to the textual level by considering the HTML structure of the web pages. This has a lot of advantages for the analyst. First and foremost, it can easily be identified on a large scale what has actually been viewed and read on the stimuli pages by the subjects. Second, the web page structure can be used to filter to AOIs. Third, gaze data of multiple users can be presented on the same page, and fourth, fixation times on text can be exported and further processed in other tools. We present the software, its validation, and example use cases with data from three existing IIR experiments.
IRSep 7, 2018
Challenges for Measuring Usefulness of Interactive IR Systems with Log-based ApproachesDaniel Hienert, Peter Mutschke
The usefulness evaluation model proposed by Cole et al. in 2009 [2] focuses on the evaluation of interactive IR systems by their support towards the user's overall goal, sub goals and tasks. This is a more human focus of the IR evaluation process than with classical TREC-oriented studies and gives a more holistic view on the IR evaluation process. However, yet there is no formal framework how the usefulness model can be operationalized. Additionally, a lot of information needed for the operationalization is only available in explicit user studies where for example the overall goal and the tasks are prompted from the users or are predefined. Measuring the usefulness of IR systems outside the laboratory is a challenging task as most often only log data of user interaction is available. But, an operationalization of the usefulness model based on interaction data could be applied to diverse systems and evaluation results would be comparable. In this article we discuss the challenges for measuring the usefulness of IIR systems with log-based approaches.
IRSep 7, 2018
Data Requirements for Evaluation of Personalization of Information Retrieval - A Position PaperNicholas J. Belkin, Daniel Hienert, Philipp Mayr et al.
Two key, but usually ignored, issues for the evaluation of methods of personalization for information retrieval are: that such evaluation must be of a search session as a whole; and, that people, during the course of an information search session, engage in a variety of activities, intended to accomplish differ- ent goals or intentions. Taking serious account of these factors has major impli- cations for not only evaluation methods and metrics, but also for the nature of the data that is necessary both for understanding and modeling information search, and for evaluation of personalized support for information retrieval (IR). In this position paper, we: present a model of IR demonstrating why these fac- tors are important; identify some implications of accepting their validity; and, on the basis of a series of studies in interactive IR, identify some types of data concerning searcher and system behavior that we claim are, at least, necessary, if not necessarily sufficient, for meaningful evaluation of personalization of IR.
IRSep 7, 2018
Term-Mouse-Fixations as an Additional Indicator for Topical User Interests in Domain-Specific SearchDaniel Hienert, Dagmar Kern
Models in Interactive Information Retrieval (IIR) are grounded very much on the user's task in order to give system support based on different task types and topics. However, the automatic recognition of user interests from log data in search systems is not trivial. Search queries entered by users a surely one such source. However, queries may be short, or users are only browsing. In this paper, we propose a method of term-mouse-fixations which takes the fixations on terms users are hovering over with the mouse into consideration to estimate topical user interests. We analyzed 22,259 search sessions of a domain-specific digital library over a period of about four months. We compared these mouse fixations to user-entered search terms and to titles and keywords from documents the user showed an interest in. These terms were found in 87.12% of all analyzed sessions; in this subset of sessions, per session on average 11.46 term-mouse-fixations from queries and viewed documents were found. These terms were fixated significantly longer with about 7 seconds than other terms with about 4.4 seconds. This means, term-mouse-fixations provide indicators for topical user interests and it is possible to extract them based on fixation time.
IRSep 7, 2018
Where Do All These Search Terms Come From? - Two Experiments in Domain-Specific SearchDaniel Hienert, Maria Lusky
Within a search session users often apply different search terms, as well as different variations and combinations of them. This way, they want to make sure that they find relevant information for different stages and aspects of their information task. Research questions which arise from this search ap- proach are: Where do users get all the ideas, hints and suggestions for new search terms or their variations from? How many ideas come from the user? How many from outside the IR system? What is the role of the used search sys- tem? To investigate these questions we used data from two experiments: first, from a user study with eye tracking data; second, from a large-scale log analy- sis. We found that in both experiments a large part of the search terms has been explicitly seen or shown before on the interface of the search system.
IRAug 21, 2018
A Usefulness-based Approach for Measuring the Local and Global Effect of IIR ServicesDaniel Hienert, Peter Mutschke
In Interactive Information Retrieval (IIR) different services such as search term suggestion can support users in their search process. The applicability and performance of such services is either measured with different user-centered studies (like usability tests or laboratory experiments) or, in the context of IR, with their contribution to measures like precision and recall. However, each evaluation methodology has its certain disadvantages. For example, user-centered experiments are often costly and small-scaled; IR experiments rely on relevance assessments and measure only relevance of documents. In this work we operationalize the usefulness model of Cole et al. (2009) on the level of system support to measure not only the local effect of an IR service, but the impact it has on the whole search process. We therefore use a log-based evaluation approach which models user interactions within sessions with positive signals and apply it for the case of a search term suggestion service. We found that the usage of the service significantly often implicates the occurrence of positive signals during the following session steps.
IRAug 21, 2018
The Role of the Task Topic in Web Search of Different Task TypesDaniel Hienert, Matthew Mitsui, Philipp Mayr et al.
When users are looking for information on the Web, they show different behavior for different task types, e.g., for fact finding vs. information gathering tasks. For example, related work in this area has investigated how this behavior can be measured and applied to distinguish between easy and difficult tasks. In this work, we look at the searcher's behavior in the domain of journalism for four different task types, and additionally, for two different topics in each task type. Search behavior is measured with a number of session variables and correlated to subjective measures such as task difficulty, task success and the usefulness of documents. We acknowledge prior results in this area that task difficulty is correlated to user effort and that easy and difficult tasks are distinguishable by session variables. However, in this work, we emphasize the role of the task topic - in and of itself - over parameters such as the search results and read content pages, dwell times, session variables and subjective measures such as task difficulty or task success. With this knowledge researchers should give more attention to the task topic as an important influence factor for user behavior.
CLApr 27, 2015
Exploring semantically-related concepts from Wikipedia: the case of SeREDaniel Hienert, Dennis Wegener, Siegfried Schomisch
In this paper we present our web application SeRE designed to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower terms, categorisation information etc. We use the Wikipedia full text body to compute the semantic relatedness for extracted terms, which results in a list of entities that are most relevant for a topic. For any given query, the user interface of SeRE visualizes these related concepts, ordered by semantic relatedness; with snippets from Wikipedia articles that explain the connection between those two entities. In a user study we examine how SeRE can be used to find important entities and their relationships for a given topic and to answer the question of how the classification system can be used for filtering.
HCApr 27, 2015
Making sense of Open Data Statistics with Information from WikipediaDaniel Hienert, Dennis Wegener, Siegfried Schomisch
Today, more and more open data statistics are published by governments, statistical offices and organizations like the United Nations, The World Bank or Eurostat. This data is freely available and can be consumed by end users in interactive visualizations. However, additional information is needed to enable laymen to interpret these statistics in order to make sense of the raw data. In this paper, we present an approach to combine open data statistics with historical events. In a user interface we have integrated interactive visualizations of open data statistics with a timeline of thematically appropriate historical events from Wikipedia. This can help users to explore statistical data in several views and to get related events for certain trends in the timeline. Events include links to Wikipedia articles, where details can be found and the search process can be continued. We have conducted a user study to evaluate if users can use the interface intuitively, if relations between trends in statistics and historical events can be found and if users like this approach for their exploration process.
IRApr 27, 2015
WHOSE - A Tool for Whole-Session Analysis in IIRDaniel Hienert, Wilko van Hoek, Alina Weber et al.
One of the main challenges in Interactive Information Retrieval (IIR) evaluation is the development and application of re-usable tools that allow researchers to analyze search behavior of real users in different environments and different domains, but with comparable results. Furthermore, IIR recently focuses more on the analysis of whole sessions, which includes all user interactions that are carried out within a session but also across several sessions by the same user. Some frameworks have already been proposed for the evaluation of controlled experiments in IIR, but yet no framework is available for interactive evaluation of search behavior from real-world information retrieval (IR) systems with real users. In this paper we present a framework for whole-session evaluation that can also utilize these uncontrolled data sets. The logging component can easily be integrated into real-world IR systems for generating and analyzing new log data. Furthermore, due to a supplementary mapping it is also possible to analyze existing log data. For every IR system different actions and filters can be defined. This allows system operators and researchers to use the framework for the analysis of user search behavior in their IR systems and to compare it with others. Using a graphical user interface they have the possibility to interactively explore the data set from a broad overview down to individual sessions.
HCSep 11, 2012
Visualizations in Exploratory Search: A User Study with Stock Market InformationDaniel Hienert, Philipp Mayr
In this paper we present an approach that integrates interactive visualizations in the exploratory search process. In this model visualizations can act as hubs where large amounts of information are made accessible in easy user interfaces. Through interaction techniques this information can be combined with related information on the World Wide Web. We applied the new search concept to the domain of stock market information and conducted a user study. Participants could use this interface without instructions, could complete complex tasks like identifying related information items, link heterogeneous information types and use different interaction techniques to access related information more easily. In this way, users could quickly acquire knowledge in an unfamiliar domain.
IRAug 20, 2012
Dealing with Sparse Document and Topic Representations: Lab Report for CHiC 2012Philipp Schaer, Daniel Hienert, Frank Sawitzki et al.
We will report on the participation of GESIS at the first CHiC workshop (Cultural Heritage in CLEF). Being held for the first time, no prior experience with the new data set, a document dump of Europeana with ca. 23 million documents, exists. The most prominent issues that arose from pretests with this test collection were the very unspecific topics and sparse document representations. Only half of the topics (26/50) contained a description and the titles were usually short with just around two words. Therefore we focused on three different term suggestion and query expansion mechanisms to surpass the sparse topical description. We used two methods that build on concept extraction from Wikipedia and on a method that applied co-occurrence statistics on the available Europeana corpus. In the following paper we will present the approaches and preliminary results from their assessments.
IRMay 18, 2012
Extraction of Historical Events from WikipediaDaniel Hienert, Francesco Luciano
The DBpedia project extracts structured information from Wikipedia and makes it available on the web. Information is gathered mainly with the help of infoboxes that contain structured information of the Wikipedia article. A lot of information is only contained in the article body and is not yet included in DBpedia. In this paper we focus on the extraction of historical events from Wikipedia articles that are available for about 2,500 years for different languages. We have extracted about 121,000 events with more than 325,000 links to DBpedia entities and provide access to this data via a Web API, SPARQL endpoint, Linked Data Interface and in a timeline application.
DLJan 12, 2012
Integrating Interactive Visualizations in the Search Process of Digital Libraries and IR SystemsDaniel Hienert, Frank Sawitzki, Philipp Schaer et al.
Interactive visualizations for exploring and retrieval have not yet become an integral part of digital libraries and information retrieval systems. We have integrated a set of interactive graphics in a real world social science digital library. These visualizations support the exploration of search queries, results and authors, can filter search results, show trends in the database and can support the creation of new search queries. The use of weighted brushing supports the identification of related metadata for search facets. We discuss some use cases of the combination of IR systems and interactive graphics. In a user study we verify that users can gain insights from statistical graphics intuitively and can adopt interaction techniques.