DLJul 24, 2023
BIP! NDR (NoDoiRefs): A Dataset of Citations From Papers Without DOIs in Computer Science Conferences and WorkshopsParis Koloveas, Serafeim Chatzopoulos, Christos Tryfonopoulos et al.
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.
DLMay 14
A Template-Driven Platform for Contextualised Researcher ProfilesSerafeim Chatzopoulos, Paris Koloveas, Kleanthis Vichos et al.
Modern researchers engage in diverse activities, assume multiple contribution roles, and produce a variety of outputs beyond traditional publications. This broader view of research contributions is increasingly recognised by responsible research assessment initiatives. However, existing researcher profiling platforms remain largely focused on publications and publication-centric indicators, offering limited support for contextualised and multi-dimensional representations of research careers. This paper presents BIP! Scholar, a platform that supports flexible researcher profiling through a template-driven approach. Researchers can create profiles tailored to different presentation or assessment contexts using track-based, narrative-style, or hybrid templates which support the representation of diverse outputs, contribution roles, and broader research activities. The platform also supports research assessment experts who wish to design and evaluate experimental profile templates.
CLFeb 20, 2025
Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMsParis Koloveas, Serafeim Chatzopoulos, Thanasis Vergoulis et al.
This work investigates the ability of open Large Language Models (LLMs) to predict citation intent through in-context learning and fine-tuning. Unlike traditional approaches relying on domain-specific pre-trained models like SciBERT, we demonstrate that general-purpose LLMs can be adapted to this task with minimal task-specific data. We evaluate twelve model variations across five prominent open LLM families using zero-, one-, few-, and many-shot prompting. Our experimental study identifies the top-performing model and prompting parameters through extensive in-context learning experiments. We then demonstrate the significant impact of task-specific adaptation by fine-tuning this model, achieving a relative F1-score improvement of 8% on the SciCite dataset and 4.3% on the ACL-ARC dataset compared to the instruction-tuned baseline. These findings provide valuable insights for model selection and prompt engineering. Additionally, we make our end-to-end evaluation framework and models openly available for future use.
AISep 24, 2025
InsightGUIDE: An Opinionated AI Assistant for Guided Critical Reading of Scientific LiteratureParis Koloveas, Serafeim Chatzopoulos, Thanasis Vergoulis et al.
The proliferation of scientific literature presents an increasingly significant challenge for researchers. While Large Language Models (LLMs) offer promise, existing tools often provide verbose summaries that risk replacing, rather than assisting, the reading of the source material. This paper introduces InsightGUIDE, a novel AI-powered tool designed to function as a reading assistant, not a replacement. Our system provides concise, structured insights that act as a "map" to a paper's key elements by embedding an expert's reading methodology directly into its core AI logic. We present the system's architecture, its prompt-driven methodology, and a qualitative case study comparing its output to a general-purpose LLM. The results demonstrate that InsightGUIDE produces more structured and actionable guidance, serving as a more effective tool for the modern researcher.
DLAug 5, 2025
Accelerating Scientific Discovery with Multi-Document Summarization of Impact-Ranked PapersParis Koloveas, Serafeim Chatzopoulos, Dionysis Diamantis et al.
The growing volume of scientific literature makes it challenging for scientists to move from a list of papers to a synthesized understanding of a topic. Because of the constant influx of new papers on a daily basis, even if a scientist identifies a promising set of papers, they still face the tedious task of individually reading through dozens of titles and abstracts to make sense of occasionally conflicting findings. To address this critical bottleneck in the research workflow, we introduce a summarization feature to BIP! Finder, a scholarly search engine that ranks literature based on distinct impact aspects like popularity and influence. Our approach enables users to generate two types of summaries from top-ranked search results: a concise summary for an instantaneous at-a-glance comprehension and a more comprehensive literature review-style summary for greater, better-organized comprehension. This ability dynamically leverages BIP! Finder's already existing impact-based ranking and filtering features to generate context-sensitive, synthesized narratives that can significantly accelerate literature discovery and comprehension.
DBJan 11, 2022
Atrapos: Real-time Evaluation of Metapath Query WorkloadsSerafeim Chatzopoulos, Thanasis Vergoulis, Dimitrios Skoutas et al.
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios. -- Note that this version of our work is more extended than the one presented in TheWebConf 2023 (doi: 10.1145/3543507.3583322)
DLJan 28, 2021
BIP! DB: A Dataset of Impact Measures for Scientific PublicationsThanasis Vergoulis, Ilias Kanellos, Claudio Atzori et al.
The growth rate of the number of scientific publications is constantly increasing, creating important challenges in the identification of valuable research and in various scholarly data management applications, in general. In this context, measures which can effectively quantify the scientific impact could be invaluable. In this work, we present BIP! DB, an open dataset that contains a variety of impact measures calculated for a large collection of more than 100 million scientific publications from various disciplines.