DLMay 27
Co-creation of AI technology, empowering curators of cultural heritage information and guarding research commonsAndrea Scharnhorst, Han Yang, Jetze Touber et al.
The substance of this paper is the description of the use of Retrieval-Augmented Generation (RAG) for specific digital collections of cultural assets. The collections are provided by institutions operating in the cultural sector. The topical areas are the humanities and social sciences. More concretely, most of the work presented here was enabled by a European-funded research project MuseIT which is clearly situated in the realm of fostering new technologies for Cultural Heritage. We adhere to this interaction by presenting a sequence of our experimentations. This sequence is narrated as a specific journey of engineering all executed around a specific data-sharing and archiving platform Dataverse. Implementing a local chatbot for collections - a method also known as RAG in Information Retrieval - is the current culmination of this journey. The engineering journey we describe in the core of the paper starts from "archives for everyone" and ends with "local chatbots for specific collections".
IRAug 29, 2016
Bibliometrics and Information Retrieval: Creating Knowledge through Research SynergiesJudit Bar-Ilan, Rob Koopman, Shenghui Wang et al.
This panel brings together experts in bibliometrics and information retrieval to discuss how each of these two important areas of information science can help to inform the research of the other. There is a growing body of literature that capitalizes on the synergies created by combining methodological approaches of each to solve research problems and practical issues related to how information is created, stored, organized, retrieved and used. The session will begin with an overview of the common threads that exist between IR and metrics, followed by a summary of findings from the BIR workshops and examples of research projects that combine aspects of each area to benefit IR or metrics research areas, including search results ranking, semantic indexing and visualization. The panel will conclude with an engaging discussion with the audience to identify future areas of research and collaboration.
DLApr 16, 2015
Contextualization of topics - browsing through terms, authors, journals and cluster allocationsRob Koopman, Shenghui Wang, Andrea Scharnhorst
This paper builds on an innovative Information Retrieval tool, Ariadne. The tool has been developed as an interactive network visualization and browsing tool for large-scale bibliographic databases. It basically allows to gain insights into a topic by contextualizing a search query (Koopman et al., 2015). In this paper, we apply the Ariadne tool to a far smaller dataset of 111,616 documents in astronomy and astrophysics. Labeled as the Berlin dataset, this data have been used by several research teams to apply and later compare different clustering algorithms. The quest for this team effort is how to delineate topics. This paper contributes to this challenge in two different ways. First, we produce one of the different cluster solution and second, we use Ariadne (the method behind it, and the interface - called LittleAriadne) to display cluster solutions of the different group members. By providing a tool that allows the visual inspection of the similarity of article clusters produced by different algorithms, we present a complementary approach to other possible means of comparison. More particular, we discuss how we can - with LittleAriadne - browse through the network of topical terms, authors, journals and cluster solutions in the Berlin dataset and compare cluster solutions as well as see their context.
IRFeb 6, 2015
Editorial for the Proceedings of the Workshop Knowledge Maps and Information Retrieval (KMIR2014) at Digital Libraries 2014Peter Mutschke, Philipp Mayr, Andrea Scharnhorst
Knowledge maps are promising tools for visualizing the structure of large-scale information spaces, but still far away from being applicable for searching. The first international workshop on "Knowledge Maps and Information Retrieval (KMIR)", held as part of the International Conference on Digital Libraries 2014 in London, aimed at bringing together experts in Information Retrieval (IR) and knowledge mapping in order to discuss the potential of interactive knowledge maps for information seeking purposes.
IRJan 12, 2015
Bibliometric-enhanced Information Retrieval: 2nd International BIR WorkshopPhilipp Mayr, Ingo Frommholz, Andrea Scharnhorst et al.
This workshop brings together experts of communities which often have been perceived as different once: bibliometrics / scientometrics / informetrics on the one side and information retrieval on the other. Our motivation as organizers of the workshop started from the observation that main discourses in both fields are different, that communities are only partly overlapping and from the belief that a knowledge transfer would be profitable for both sides. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. On the other side, more and more information professionals, working in libraries and archives are confronted with applying bibliometric techniques in their services. This way knowledge exchange becomes more urgent. The first workshop set the research agenda, by introducing in each other methods, reporting about current research problems and brainstorming about common interests. This follow-up workshop continues the overall communication, but also puts one problem into the focus. In particular, we will explore how statistical modelling of scholarship can improve retrieval services for specific communities, as well as for large, cross-domain collections like Mendeley or ResearchGate. This second BIR workshop continues to raise awareness of the missing link between Information Retrieval (IR) and bibliometrics and contributes to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the scholarly search engine interface.
IRNov 6, 2014
Scientometrics and Information Retrieval - weak-links revitalizedPhilipp Mayr, Andrea Scharnhorst
This special issue brings together eight papers from experts of communities which often have been perceived as different once: bibliometrics, scientometrics and informetrics on the one side and information retrieval on the other. The idea of this special issue started at the workshop "Combining Bibliometrics and Information Retrieval" held at the 14th International Conference of Scientometrics and Informetrics, Vienna, July 14-19, 2013. Our motivation as guest editors started from the observation that main discourses in both fields are different, that communities are only partly overlapping and from the belief that a knowledge transfer would be profitable for both sides.
IRMay 30, 2014
Knowledge Maps and Information Retrieval (KMIR)Peter Mutschke, Andrea Scharnhorst, Christophe Guéret et al.
Information systems usually show as a particular point of failure the vagueness between user search terms and the knowledge orders of the information space in question. Some kind of guided searching therefore becomes more and more important in order to precisely discover information without knowing the right search terms. Knowledge maps of digital library collections are promising navigation tools through knowledge spaces but still far away from being applicable for searching digital libraries. However, there is no continuous knowledge exchange between the "map makers" on the one hand and the Information Retrieval (IR) specialists on the other hand. Thus, there is also a lack of models that properly combine insights of the two strands. The proposed workshop aims at bringing together these two communities: experts in IR reflecting on visual enhanced search interfaces and experts in knowledge mapping reflecting on visualizations of the content of a collection that might also present a context for a search term in a visual manner. The intention of the workshop is to raise awareness of the potential of interactive knowledge maps for information seeking purposes and to create a common ground for experiments aiming at the incorporation of knowledge maps into IR models at the level of the user interface.
IRApr 28, 2014
Editorial for the Bibliometric-enhanced Information Retrieval Workshop at ECIR 2014Philipp Mayr, Philipp Schaer, Andrea Scharnhorst et al.
This first "Bibliometric-enhanced Information Retrieval" (BIR 2014) workshop aims to engage with the IR community about possible links to bibliometrics and scholarly communication. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of co-authorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. This workshop aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics / scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface. Our interests include information retrieval, information seeking, science modelling, network analysis, and digital libraries. The goal is to apply insights from bibliometrics, scientometrics, and informetrics to concrete practical problems of information retrieval and browsing.
IROct 30, 2013
Bibliometric-enhanced Information RetrievalPhilipp Mayr, Andrea Scharnhorst, Birger Larsen et al.
Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of coauthorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. This workshop aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics/scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface.
DLJan 22, 2013
"Seed+Expand": A validated methodology for creating high quality publication oeuvres of individual researchersLinda Reijnhoudt, Rodrigo Costas, Ed Noyons et al.
The study of science at the individual micro-level frequently requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and validates a new methodology called seed+expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from the National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify "seed publications" for each author using five different approaches. Subsequently, we "expand" the set of publication in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a "gold standard" dataset of authors for which verified publications in the period 2001-2010 are available.