Corinna Breitinger

h-index16

6papers

112citations

Novelty28%

AI Score40

Ranked #73,599 of 194,257 authors (top 38%)#754 in IR (top 35%)

6 Papers

3.5IRMar 3, 2023Code

Discovery and Recognition of Formula Concepts using Machine Learning

Philipp Scharpf, Moritz Schubotz, Howard S. Cohl et al.

Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a 'Formula Concept' that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering as well as document similarity assessments for plagiarism detection or recommender systems.

6.9NIApr 14

Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS

Dennis Trautwein, Cornelius Ihle, Moritz Schubotz et al.

The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very centralization they seek to avoid. This paper presents the first large-scale measurement study of a fully decentralized NAT traversal protocol, Direct Connection Upgrade through Relay (DCUtR), within the production libp2p-based InterPlanetary File System (IPFS) network. Drawing on over 4.4 million traversal attempts from 85,000+ distinct networks across 167 countries, we provide an empirical analysis of modern P2P connectivity. We establish a conditional success rate of $70\% \pm 7.1\%$ for the hole-punching stage, given that prerequisite relay reservation and public address discovery succeed, providing a crucial new benchmark for the field. Critically, we empirically challenge the long-held belief of UDP's superiority for NAT traversal, demonstrating that DCUtR's high-precision, RTT-based synchronization yields statistically indistinguishable success rates for both TCP and QUIC ($\sim70\%$). Our analysis further validates the protocol's design for permissionless environments by showing that success is independent of relay characteristics and that the mechanism is highly efficient, with $97.6\%$ of successful connections established on the first attempt. Building on this analysis, we propose a concrete roadmap of protocol enhancements aimed at achieving universal connectivity and contribute our complete dataset to foster further research in this domain.

2.4CLSep 6, 2019Code

Giveme5W1H: A Universal System for Extracting Main Events from News Articles

Felix Hamborg, Corinna Breitinger, Bela Gipp

Event extraction from news articles is a commonly required prerequisite for various tasks, such as article summarization, article clustering, and news aggregation. Due to the lack of universally applicable and publicly available methods tailored to news datasets, many researchers redundantly implement event extraction methods for their own projects. The journalistic 5W1H questions are capable of describing the main event of an article, i.e., by answering who did what, when, where, why, and how. We provide an in-depth description of an improved version of Giveme5W1H, a system that uses syntactic and domain-specific rules to automatically extract the relevant phrases from English news articles to provide answers to these 5W1H questions. Given the answers to these questions, the system determines an article's main event. In an expert evaluation with three assessors and 120 articles, we determined an overall precision of p=0.73, and p=0.82 for answering the first four W questions, which alone can sufficiently summarize the main event reported on in a news article. We recently made our system publicly available, and it remains the only universal open-source 5W1H extractor capable of being applied to a wide range of use cases in news analysis.

5.2CRFeb 13, 2022

Non-fungible Tokens: Promise or Peril?

Arsalan Parham, Corinna Breitinger

Non-fungible tokens or NFTs are the digital assets on a blockchain. NFTs are unique and they cannot be divided like cryptocurrencies. NFTs could store digital ownership of an artwork or collections or can be fan tokens or tickets for clubs. NFTs are based on a smart contract on a blockchain network which supports them, such as Ethereum, Cardano or Polkadot. Most of the NFTs are now minted on Ethereum (ERC-20) network, but it has some main issues like high transaction fees and low speed. There are lots of domains which can be benefited from NFT technology such as art, music, gaming, sport and wildlife conservation. NFTs could be also bought or sold on lots of NFT marketplaces such as OpenSea and Chiliz. The trend is in a huge hype because the market cap and popularity of NFTs are growing significantly.

3.6IRSep 16, 2021Code

A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles

Malte Ostendorff, Corinna Breitinger, Bela Gipp

Literature recommendation systems (LRS) assist readers in the discovery of relevant content from the overwhelming amount of literature available. Despite the widespread adoption of LRS, there is a lack of research on the user-perceived recommendation characteristics for fundamentally different approaches to content-based literature recommendation. To complement existing quantitative studies on literature recommendation, we present qualitative study results that report on users' perceptions for two contrasting recommendation classes: (1) link-based recommendation represented by the Co-Citation Proximity (CPA) approach, and (2) text-based recommendation represented by Lucene's MoreLikeThis (MLT) algorithm. The empirical data analyzed in our study with twenty users and a diverse set of 40 Wikipedia articles indicate a noticeable difference between text- and link-based recommendation generation approaches along several key dimensions. The text-based MLT method receives higher satisfaction ratings in terms of user-perceived similarity of recommended articles. In contrast, the CPA approach receives higher satisfaction scores in terms of diversity and serendipity of recommendations. We conclude that users of literature recommendation systems can benefit most from hybrid approaches that combine both link- and text-based approaches, where the user's information needs and preferences should control the weighting for the approaches used. The optimal weighting of multiple approaches used in a hybrid recommendation system is highly dependent on a user's shifting needs.

15.6IRMar 27, 2017

Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia

Joeran Beel, Akiko Aizawa, Corinna Breitinger et al.

Only few digital libraries and reference managers offer recommender systems, although such systems could assist users facing information overload. In this paper, we introduce Mr. DLib's recommendations-as-a-service, which allows third parties to easily integrate a recommender system into their products. We explain the recommender approaches implemented in Mr. DLib (content-based filtering among others), and present details on 57 million recommendations, which Mr. DLib delivered to its partner GESIS Sowiport. Finally, we outline our plans for future development, including integration into JabRef, establishing a living lab, and providing personalized recommendations.