Bernhard Haslhofer

CR
h-index6
24papers
684citations
Novelty34%
AI Score47

24 Papers

LGOct 14, 2022
Autoencoder based Anomaly Detection and Explained Fault Localization in Industrial Cooling Systems

Stephanie Holly, Robin Heel, Denis Katic et al.

Anomaly detection in large industrial cooling systems is very challenging due to the high data dimensionality, inconsistent sensor recordings, and lack of labels. The state of the art for automated anomaly detection in these systems typically relies on expert knowledge and thresholds. However, data is viewed isolated and complex, multivariate relationships are neglected. In this work, we present an autoencoder based end-to-end workflow for anomaly detection suitable for multivariate time series data in large industrial cooling systems, including explained fault localization and root cause analysis based on expert knowledge. We identify system failures using a threshold on the total reconstruction error (autoencoder reconstruction error including all sensor signals). For fault localization, we compute the individual reconstruction error (autoencoder reconstruction error for each sensor signal) allowing us to identify the signals that contribute most to the total reconstruction error. Expert knowledge is provided via look-up table enabling root-cause analysis and assignment to the affected subsystem. We demonstrated our findings in a cooling system unit including 34 sensors over a 8-months time period using 4-fold cross validation approaches and automatically created labels based on thresholds provided by domain experts. Using 4-fold cross validation, we reached a F1-score of 0.56, whereas the autoencoder results showed a higher consistency score (CS of 0.92) compared to the automatically created labels (CS of 0.62) -- indicating that the anomaly is recognized in a very stable manner. The main anomaly was found by the autoencoder and automatically created labels and was also recorded in the log files. Further, the explained fault localization highlighted the most affected component for the main anomaly in a very consistent manner.

LGMay 30, 2022
Machine Learning Methods for Health-Index Prediction in Coating Chambers

Clemens Heistracher, Anahid Jalali, Jürgen Schneeweiss et al.

Coating chambers create thin layers that improve the mechanical and optical surface properties in jewelry production using physical vapor deposition. In such a process, evaporated material condensates on the walls of such chambers and, over time, causes mechanical defects and unstable processes. As a result, manufacturers perform extensive maintenance procedures to reduce production loss. Current rule-based maintenance strategies neglect the impact of specific recipes and the actual condition of the vacuum chamber. Our overall goal is to predict the future condition of the coating chamber to allow cost and quality optimized maintenance of the equipment. This paper describes the derivation of a novel health indicator that serves as a step toward condition-based maintenance for coating chambers. We indirectly use gas emissions of the chamber's contamination to evaluate the machine's condition. Our approach relies on process data and does not require additional hardware installation. Further, we evaluated multiple machine learning algorithms for a condition-based forecast of the health indicator that also reflects production planning. Our results show that models based on decision trees are the most effective and outperform all three benchmarks, improving at least $0.22$ in the mean average error. Our work paves the way for cost and quality optimized maintenance of coating applications.

CEMay 22
DeFi Yield Aggregators: Analysing Investment Strategies and Structural Dependencies

Stefan Kitzler, Kasra Zarinehbaf Asadi, Svetlana Kremer et al.

Yield aggregators are financial services in Decentralised Finance (DeFi) providing automated investment management and return optimisation for users. In this study, we investigate the operational mechanisms and monetary flows of two major yield aggregators, Yearn Finance and Cian, over the period from May 4, 2024 to May 3, 2025. Our supporting conceptual framework decomposes yield aggregator operations into user investment and strategy management cycles. Using a network approach for 2,459 Yearn and 921 Cian transactions, we trace protocol interactions and capital flows across the ecosystem. Users invested 15.7M USD into Yearn's USDC vault, which generated yield through liquidity provision and dynamic allocation across DeFi protocols. Cian, deployed later, attracted 54.0M USD into its staked-ETH (stETH) vault and implemented sophisticated leverage through flashloan-enabled recursive staking. Yearn's USDC vault achieves an annual yield of 5.41%, while Cian's stETH vault produces 4.22% with higher risk exposure. We use the operational insights from our analysis to extend the existing DeFi Stack Reference Model (DSR) with new financial primitives to highlight structural risk dependencies. Overall, our findings show that strategic complexity in yield aggregation does not necessarily translate into higher returns but materially expands risk exposure.

CEMay 19
Modern Portfolio Theory in the Crypto-Wilderness

Ivan Vynyavskyy, Stefan Kitzler, Bernhard Haslhofer et al.

Modern Portfolio Theory (MPT) prescribes how to maximise the return of an asset portfolio for a given level of risk. The optimal trade-off between return and variance defines the efficient frontier. Whether actual cryptoasset portfolios approximate this prescription and whether proximity to the frontier translates into realised performance remain difficult to test at large scale in traditional markets due to their opaque nature and the inaccessibility of data. As we show, public blockchains make these questions measurable: every token transfer is recorded, thus enabling complete portfolio reconstruction for every account at any point in time. We leverage this transparency to reconstruct cryptoasset portfolios for over 116M Ethereum accounts across the full chain history (2015-2025), measure their distance to the constrained efficient frontier, and quantify how deviations translate into realised performance. Here we show that market entry timing, not allocation choice, is the dominant predictor of realised cryptoasset returns. On-chain wealth is highly concentrated and portfolios are pervasively under-diversified, with single-asset holdings accounting for 83.35% of accounts. Two-asset portfolios sit closest to the efficient frontier defined by their held assets, a proximity that reflects the narrowness of their opportunity set rather than deliberate optimisation. Passive market-capitalisation weighting outperforms every MPT optimisation strategy in median realised return, and entry month alone explains 70-79% of the variance in returns, far exceeding the contribution of allocation choice. Mean-variance optimisation therefore appears neither descriptive of observed behaviour nor prescriptively useful in the cryptoasset domain, even if MPT retains its value as a normative benchmark.

LGSep 21, 2023
Predictability and Comprehensibility in Post-Hoc XAI Methods: A User-Centered Analysis

Anahid Jalali, Bernhard Haslhofer, Simone Kriglstein et al.

Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.

SIMar 12
Credibility Matters: Motivations, Characteristics, and Influence Mechanisms of Crypto Key Opinion Leaders

Alexander Kropiunig, Svetlana Kremer, Bernhard Haslhofer

Crypto Key Opinion Leaders (KOLs) shape Web3 narratives and retail investment behaviour. In volatile, high-risk markets, their credibility becomes a key determinant of their influence on followers. Yet prior research has focused on lifestyle influencers or generic financial commentary, leaving crypto KOLs' understandings of motivation, credibility, and responsibility underexplored. Drawing on interviews with 13 KOLs and self-determination theory (SDT), we examine how psychological needs are negotiated alongside monetisation and community expectations. Whereas prior work treats finfluencer credibility as a set of static credentials, our findings reveal it to be a self-determined, ethically enacted practice. We identify four community-recognised markers of credibility: self-regulation, bounded epistemic competence, accountability, and reflexive self-correction. This reframes credibility as socio-technical performance, extending SDT into high-risk crypto ecosystems. Methodologically, we employ a hybrid human-LLM thematic analysis. The study surfaces implications for designing credibility signals that prioritise transparency over hype.

CRFeb 26, 2021Code
GraphSense: A General-Purpose Cryptoasset Analytics Platform

Bernhard Haslhofer, Rainer Stütz, Matteo Romiti et al.

There is currently an increasing demand for cryptoasset analysis tools among cryptoasset service providers, the financial industry in general, as well as across academic fields. At the moment, one can choose between commercial services or low-level open-source tools providing programmatic access. In this paper, we present the design and implementation of another option: the GraphSense Cryptoasset Analytics Platform, which can be used for interactive investigations of monetary flows and, more importantly, for executing advanced analytics tasks using a standard data science tool stack. By providing a growing set of open-source components, GraphSense could ultimately become an instrument for scientific investigations in academia and a possible response to emerging compliance and regulation challenges for businesses and organizations dealing with cryptoassets.

CRApr 11, 2018Code
Ransomware Payments in the Bitcoin Ecosystem

Masarah Paquet-Clouston, Bernhard Haslhofer, Benoit Dupont

Ransomware can prevent a user from accessing a device and its files until a ransom is paid to the attacker, most frequently in Bitcoin. With over 500 known ransomware families, it has become one of the dominant cybercrime threats for law enforcement, security professionals and the public. However, a more comprehensive, evidence-based picture on the global direct financial impact of ransomware attacks is still missing. In this paper, we present a data-driven method for identifying and gathering information on Bitcoin transactions related to illicit activity based on footprints left on the public Bitcoin blockchain. We implement this method on-top-of the GraphSense open-source platform and apply it to empirically analyze transactions related to 35 ransomware families. We estimate the lower bound direct financial impact of each ransomware family and find that, from 2013 to mid-2017, the market for ransomware payments has a minimum worth of USD 12,768,536 (22,967.54 BTC). We also find that the market is highly skewed with only a few number of players responsible for the majority of the payments. Based on these research findings, policy-makers and law enforcement agencies can use the statistics provided to understand the size of the illicit market and make informed decisions on how best to address the threat.

CRMar 28, 2024
Detecting Financial Bots on the Ethereum Blockchain

Thomas Niedermayer, Pietro Saggese, Bernhard Haslhofer

The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.

CRFeb 12, 2025
Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach

Régnier Avice, Bernhard Haslhofer, Zhidong Li et al.

Attribution tags form the foundation of modern cryptoasset forensics. However, inconsistent or incorrect tags can mislead investigations and even result in false accusations. To address this issue, we propose a novel computational method based on Large Language Models (LLMs) to link attribution tags with well-defined knowledge graph concepts. We implemented this method in an end-to-end pipeline and conducted experiments showing that our approach outperforms baseline methods by up to 37.4% in F1-score across three publicly available attribution tag datasets. By integrating concept filtering and blocking procedures, we generate candidate sets containing five knowledge graph entities, achieving a recall of 93% without the need for labeled data. Additionally, we demonstrate that local LLM models can achieve F1-scores of 90%, comparable to remote models which achieve 94%. We also analyze the cost-performance trade-offs of various LLMs and prompt templates, showing that selecting the most cost-effective configuration can reduce costs by 90%, with only a 1% decrease in performance. Our method not only enhances attribution tag quality but also serves as a blueprint for fostering more reliable forensic evidence.

CRNov 5, 2021
Disentangling Decentralized Finance (DeFi) Compositions

Stefan Kitzler, Friedhelm Victor, Pietro Saggese et al.

We present a measurement study on compositions of Decentralized Finance protocols, which aim to disrupt traditional finance and offer services on top of distributed ledgers, such as Ethereum. DeFi compositions may impact the development of ecosystem interoperability, are increasingly integrated with web technologies, and may introduce risks through complexity. Starting from a dataset of 23 labeled DeFi protocols and 10,663,881 associated Ethereum accounts, we study the interactions of protocols and associated smart contracts. From a network perspective, we find that decentralized exchanges and lending protocols have high degree and centrality values, that interactions among protocol nodes primarily occur in a strongly connected component, and that known community detection methods cannot disentangle DeFi protocols. Therefore, we propose an algorithm to decompose a protocol call into a nested set of building blocks that may be part of other DeFi protocols. With a ground truth dataset we have collected, we can demonstrate the algorithm's capability by finding that swaps are the most frequently used building blocks. As building blocks can be nested, i.e., contained in each other, we provide visualizations of composition trees for deeper inspections. We also present a broad picture of DeFi compositions by extracting and flattening the entire nested building block structure across multiple DeFi protocols. Finally, to demonstrate the practicality of our approach, we present a case study that is inspired by the recent collapse of the UST stablecoin in the Terra ecosystem. Under the hypothetical assumption that the stablecoin USD Tether would experience a similar fate, we study which building blocks and, thereby, DeFi protocols would be affected. Overall, our results and methods contribute to a better understanding of a new family of financial products.

LGOct 8, 2021
Minimal-Configuration Anomaly Detection for IIoT Sensors

Clemens Heistracher, Anahid Jalali, Axel Suendermann et al.

The increasing deployment of low-cost IoT sensor platforms in industry boosts the demand for anomaly detection solutions that fulfill two key requirements: minimal configuration effort and easy transferability across equipment. Recent advances in deep learning, especially long-short-term memory (LSTM) and autoencoders, offer promising methods for detecting anomalies in sensor data recordings. We compared autoencoders with various architectures such as deep neural networks (DNN), LSTMs and convolutional neural networks (CNN) using a simple benchmark dataset, which we generated by operating a peristaltic pump under various operating conditions and inducing anomalies manually. Our preliminary results indicate that a single model can detect anomalies under various operating conditions on a four-dimensional data set without any specific feature engineering for each operating condition. We consider this work as being the first step towards a generic anomaly detection method, which is applicable for a wide range of industrial equipment.

CRSep 21, 2021
Adoption and Actual Privacy of Decentralized CoinJoin Implementations in Bitcoin

Rainer Stütz, Johann Stockinger, Bernhard Haslhofer et al.

We present a first measurement study on the adoption and actual privacy of two popular decentralized CoinJoin implementations, Wasabi and Samourai, in the broader Bitcoin ecosystem. By applying highly accurate (> 99%) algorithms we can effectively detect 30,251 Wasabi and 223,597 Samourai transactions within the block range 530,500 to 725,348 (2018-07-05 to 2022-02-28). We also found a steady adoption of these services with a total value of mixed coins of ca. 4.74 B USD and average monthly mixing amounts of ca. 172.93 M USD) for Wasabi and ca. 41.72 M USD for Samourai. Furthermore, we could trace ca. 322 M USD directly received by cryptoasset exchanges and ca. 1.16 B USD indirectly received via two hops. Our analysis further shows that the traceability of addresses during the pre-mixing and post-mixing narrows down the anonymity set provided by these coin mixing services. It also shows that the selection of addresses for the CoinJoin transaction can harm anonymity. Overall, this is the first paper to provide a comprehensive picture of the adoption and privacy of distributed CoinJoin transactions. Understanding this picture is particularly interesting in the light of ongoing regulatory efforts that will, on the one hand, affect compliance measures implemented in cryptocurrency exchanges and, on the other hand, the privacy of end-users.

GNOct 23, 2020
Exploring investor behavior in Bitcoin: a study of the disposition effect

Jürgen E. Schatzmann, Bernhard Haslhofer

Investors commonly exhibit the disposition effect - the irrational tendency to sell their winning investments and hold onto their losing ones. While this phenomenon has been observed in many traditional markets, it remains unclear whether it also applies to atypical markets like cryptoassets. This paper investigates the prevalence of the disposition effect in Bitcoin using transactions targeting cryptoasset exchanges as proxies for selling transactions. Our findings suggest that investors in Bitcoin were indeed subject to the disposition effect, with varying intensity. They also show that the disposition effect was not consistently present throughout the observation period. Its prevalence was more evident from the boom and bust year 2017 onwards, as confirmed by various technical indicators. Our study suggests irrational investor behavior is also present in atypical markets like Bitcoin.

CRJul 1, 2020
Cross-Layer Deanonymization Methods in the Lightning Protocol

Matteo Romiti, Friedhelm Victor, Pedro Moreno-Sanchez et al.

Bitcoin (BTC) pseudonyms (layer 1) can effectively be deanonymized using heuristic clustering techniques. However, while performing transactions off-chain (layer 2) in the Lightning Network (LN) seems to enhance privacy, a systematic analysis of the anonymity and privacy leakages due to the interaction between the two layers is missing. We present clustering heuristics that group BTC addresses, based on their interaction with the LN, as well as LN nodes, based on shared naming and hosting information. We also present linking heuristics that link 45.97% of all LN nodes to 29.61% BTC addresses interacting with the LN. These links allow us to attribute information (e.g., aliases, IP addresses) to 21.19% of the BTC addresses contributing to their deanonymization. Further, these deanonymization results suggest that the security and privacy of LN payments are weaker than commonly believed, with LN users being at the mercy of as few as five actors that control 36 nodes and over 33% of the total capacity. Overall, this is the first paper to present a method for linking LN nodes with BTC addresses across layers and to discuss privacy and security implications.

NIJan 24, 2020
All that Glitters is not Bitcoin -- Unveiling the Centralized Nature of the BTC (IP) Network

Sami Ben Mariem, Pedro Casas, Matteo Romiti et al.

Blockchains are typically managed by peer-to-peer (P2P) networks providing the support and substrate to the so-called distributed ledger (DLT), a replicated, shared, and synchronized data structure, geographically spread across multiple nodes. The Bitcoin (BTC) blockchain is by far the most well known DLT, used to record transactions among peers, based on the BTC digital currency. In this paper, we focus on the network side of the BTC P2P network, analyzing its nodes from a purely network measurements-based approach. We present a BTC crawler able to discover and track the BTC P2P network through active measurements, and use it to analyze its main properties. Through the combined analysis of multiple snapshots of the BTC network as well as by using other publicly available data sources on the BTC network and DLT, we unveil the BTC P2P network, locate its active nodes, study their performance, and track the evolution of the network over the past two years. Among other relevant findings, we show that (i) the size of the BTC network has remained almost constant during the last 12 months - since the major BTC price drop in early 2018, (ii) most of the BTC P2P network resides in US and EU countries, and (iii) despite this western network locality, most of the mining activity and corresponding revenue is controlled by major mining pools located in China. By additionally analyzing the distribution of BTC coins among independent BTC entities (i.e., single BTC addresses or groups of BTC addresses controlled by the same actor), we also conclude that (iv) BTC is very far from being the decentralized and uncontrolled system it is so much advertised to be, with only 4.5% of all the BTC entities holding about 85% of all circulating BTC coins.

CRJan 13, 2020
Stake Shift in Major Cryptocurrencies: An Empirical Study

Rainer Stütz, Peter Gaži, Bernhard Haslhofer et al.

In the proof-of-stake (PoS) paradigm for maintaining decentralized, permissionless cryptocurrencies, Sybil attacks are prevented by basing the distribution of roles in the protocol execution on the stake distribution recorded in the ledger itself. However, for various reasons this distribution cannot be completely up-to-date, introducing a gap between the present stake distribution, which determines the parties' current incentives, and the one used by the protocol. In this paper, we investigate this issue, and empirically quantify its effects. We survey existing provably secure PoS proposals to observe that the above time gap between the two stake distributions, which we call stake distribution lag, amounts to several days for each of these protocols. Based on this, we investigate the ledgers of four major cryptocurrencies (Bitcoin, Bitcoin Cash, Litecoin and Zcash) and compute the average stake shift (the statistical distance of the two distributions) for each value of stake distribution lag between 1 and 14 days, as well as related statistics. We also empirically quantify the sublinear growth of stake shift with the length of the considered lag interval. Finally, we turn our attention to unusual stake-shift spikes in these currencies: we observe that hard forks trigger major stake shifts and that single real-world actors, mostly exchanges, account for major stake shifts in established cryptocurrency ecosystems.

CRAug 2, 2019
Spams meet Cryptocurrencies: Sextortion in the Bitcoin Ecosystem

Masarah Paquet-Clouston, Matteo Romiti, Bernhard Haslhofer et al.

In the past year, a new spamming scheme has emerged: sexual extortion messages requiring payments in the cryptocurrency Bitcoin, also known as sextortion. This scheme represents a first integration of the use of cryptocurrencies by members of the spamming industry. Using a dataset of 4,340,736 sextortion spams, this research aims at understanding such new amalgamation by uncovering spammers' operations. To do so, a simple, yet effective method for projecting Bitcoin addresses mentioned in sextortion spams onto transaction graph abstractions is computed over the entire Bitcoin blockchain. This allows us to track and investigate monetary flows between involved actors and gain insights into the financial structure of sextortion campaigns. We find that sextortion spammers are somewhat sophisticated, following pricing strategies and benefiting from cost reductions as their operations cut the upper-tail of the spamming supply chain. We discover that one single entity is likely controlling the financial backbone of the majority of the sextortion campaigns and that the 11-month operation studied yielded a lower-bound revenue between \$1,300,620 and \$1,352,266. We conclude that sextortion spamming is a lucrative business and spammers will likely continue to send bulk emails that try to extort money through cryptocurrencies.

CRMay 15, 2019
A Deep Dive into Bitcoin Mining Pools: An Empirical Analysis of Mining Shares

Matteo Romiti, Aljosha Judmayer, Alexei Zamyatin et al.

Miners play a key role in cryptocurrencies such as Bitcoin: they invest substantial computational resources in processing transactions and minting new currency units. It is well known that an attacker controlling more than half of the network's mining power could manipulate the state of the system at will. While the influence of large mining pools appears evenly split, the actual distribution of mining power within these pools and their economic relationships with other actors remain undisclosed. To this end, we conduct the first in-depth analysis of mining reward distribution within three of the four largest Bitcoin mining pools and examine their cross-pool economic relationships. Our results suggest that individual miners are simultaneously operating across all three pools and that in each analyzed pool a small number of actors (<= 20) receives over 50% of all BTC payouts. While the extent of an operator's control over the resources of a mining pool remains an open debate, our findings are in line with previous research, pointing out centralization tendencies in large mining pools and cryptocurrencies in general.

LGApr 16, 2019
Predicting Time-to-Failure of Plasma Etching Equipment using Machine Learning

Anahid Jalali, Clemens Heistracher, Alexander Schindler et al.

Predicting unscheduled breakdowns of plasma etching equipment can reduce maintenance costs and production losses in the semiconductor industry. However, plasma etching is a complex procedure and it is hard to capture all relevant equipment properties and behaviors in a single physical model. Machine learning offers an alternative for predicting upcoming machine failures based on relevant data points. In this paper, we describe three different machine learning tasks that can be used for that purpose: (i) predicting Time-To-Failure (TTF), (ii) predicting health state, and (iii) predicting TTF intervals of an equipment. Our results show that trained machine learning models can outperform benchmarks resembling human judgments in all three tasks. This suggests that machine learning offers a viable alternative to currently deployed plasma etching equipment maintenance strategies and decision making processes.

CRDec 6, 2018
An Empirical Analysis of Monero Cross-Chain Traceability

Abraham Hinteregger, Bernhard Haslhofer

Monero is a privacy-centric cryptocurrency that makes payments untraceable by adding decoys to every real input spent in a transaction. Two studies from 2017 found methods to distinguish decoys from real inputs, which enabled traceability for a majority of transactions. Since then, a number protocol changes have been introduced, but their effectiveness has not yet been reassessed. Furthermore, little is known about traceability of Monero transactions across hard fork chains. We formalize a new method for tracing Monero transactions, which is based on analyzing currency hard forks. We use that method to perform a (passive) traceability analysis on data from the Monero, MoneroV and Monero Original blockchains and find that only a small amount of inputs are traceable. We then use the results to estimate the effectiveness of known heuristics for recent transactions and find that they do not significantly outperform random guessing. Our findings suggest that Monero is currently mostly immune to known passive attack vectors and resistant to tracking and tracing methods applied to other cryptocurrencies.

IRApr 8, 2013
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Elizabeth L. Murnane, Bernhard Haslhofer, Carl Lagoze

We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by developing a model of user-interest with respect to a personal knowledge context; and Wikipedia, a particularly well-established and reliable knowledge base, is used to instantiate the procedure. We conduct systematic evaluations using individuals' posts from Twitter, YouTube, and Flickr and demonstrate that our novel technique is able to achieve substantial performance gains beyond state-of-the-art NED methods.

DLApr 5, 2013
Semantic Tagging on Historical Maps

Bernhard Haslhofer, Werner Robitza, Carl Lagoze et al.

Tags assigned by users to shared content can be ambiguous. As a possible solution, we propose semantic tagging as a collaborative process in which a user selects and associates Web resources drawn from a knowledge context. We applied this general technique in the specific context of online historical maps and allowed users to annotate and tag them. To study the effects of semantic tagging on tag production, the types and categories of obtained tags, and user task load, we conducted an in-lab within-subject experiment with 24 participants who annotated and tagged two distinct maps. We found that the semantic tagging implementation does not affect these parameters, while providing tagging relationships to well-defined concept definitions. Compared to label-based tagging, our technique also gathers positive and negative tagging relationships. We believe that our findings carry implications for designers who want to adopt semantic tagging in other contexts and systems on the Web.

DLJun 6, 2012
Finding Quality Issues in SKOS Vocabularies

Christian Mader, Bernhard Haslhofer, Antoine Isaac

The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.