Simon Dobnik

h-index8

7papers

14citations

Novelty24%

AI Score28

Ranked #155,426 of 205,806 authors (top 76%)#26,970 in CL (top 83%)

7 Papers

CLAug 30, 2023

Grandma Karl is 27 years old -- research agenda for pseudonymization of research data

Elena Volodina, Simon Dobnik, Therese Lindström Tiedemann et al.

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions. General Data Protection Regulation (GDPR) suggests pseudonymization as a solution to secure open access to research data, but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data. This paper outlines a research agenda within pseudonymization, namely need of studies into the effects of pseudonymization on unstructured data in relation to e.g. readability and language assessment, as well as the effectiveness of pseudonymization as a way of protecting writer identity, while also exploring different ways of developing context-sensitive algorithms for detection, labelling and replacement of personal information in unstructured data. The recently granted project on pseudonymization Grandma Karl is 27 years old addresses exactly those challenges.

IRApr 21, 2022

Multi-task recommendation system for scientific papers with high-way networks

Aram Karimi, Simon Dobnik

Finding and selecting the most relevant scientific papers from a large number of papers written in a research community is one of the key challenges for researchers these days. As we know, much information around research interest for scholars and academicians belongs to papers they read. Analysis and extracting contextual features from these papers could help us to suggest the most related paper to them. In this paper, we present a multi-task recommendation system (RS) that predicts a paper recommendation and generates its meta-data such as keywords. The system is implemented as a three-stage deep neural network encoder that tries to maps longer sequences of text to an embedding vector and learns simultaneously to predict the recommendation rate for a particular user and the paper's keywords. The motivation behind this approach is that the paper's topics expressed as keywords are a useful predictor of preferences of researchers. To achieve this goal, we use a system combination of RNNs, Highway and Convolutional Neural Networks to train end-to-end a context-aware collaborative matrix. Our application uses Highway networks to train the system very deep, combine the benefits of RNN and CNN to find the most important factor and make latent representation. Highway Networks allow us to enhance the traditional RNN and CNN pipeline by learning more sophisticated semantic structural representations. Using this method we can also overcome the cold start problem and learn latent features over large sequences of text.

CLNov 6, 2025

Surprisal reveals diversity gaps in image captioning and different scorers change the story

Nikolai Ilinykh, Simon Dobnik

We quantify linguistic diversity in image captioning with surprisal variance - the spread of token-level negative log-probabilities within a caption set. On the MSCOCO test set, we compare five state-of-the-art vision-and-language LLMs, decoded with greedy and nucleus sampling, to human captions. Measured with a caption-trained n-gram LM, humans display roughly twice the surprisal variance of models, but rescoring the same captions with a general-language model reverses the pattern. Our analysis introduces the surprisal-based diversity metric for image captioning. We show that relying on a single scorer can completely invert conclusions, thus, robust diversity evaluation must report surprisal under several scorers.

CLSep 10, 2021

We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics

Simon Dobnik, Robin Cooper, Adam Ek et al.

In this paper we examine different meaning representations that are commonly used in different natural language applications today and discuss their limits, both in terms of the aspects of the natural language meaning they are modelling and in terms of the aspects of the application for which they are used.

HCMar 23, 2019

Referring to the recently seen: reference and perceptual memory in situated dialog

John D. Kelleher, Simon Dobnik

From theoretical linguistic and cognitive perspectives, situated dialog systems are interesting as they provide ideal test-beds for investigating the interaction between language and perception. At the same time there are a growing number of practical applications, for example robotic systems and driver-less cars, where spoken interfaces, capable of situated dialog, promise many advantages. To date, however much of the work on situated dialog has focused resolving anaphoric or exophoric references. This paper, by contrast, opens up the question of how perceptual memory and linguistic references interact, and the challenges that this poses to computational models of perceptually grounded dialog.

LGJul 21, 2018

What is not where: the challenge of integrating spatial representations into deep learning architectures

John D. Kelleher, Simon Dobnik

This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the captions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects.

CLJul 21, 2018

Modular Mechanistic Networks: On Bridging Mechanistic and Phenomenological Models with Deep Neural Networks in Natural Language Processing

Simon Dobnik, John D. Kelleher

Natural language processing (NLP) can be done using either top-down (theory driven) and bottom-up (data driven) approaches, which we call mechanistic and phenomenological respectively. The approaches are frequently considered to stand in opposition to each other. Examining some recent approaches in deep learning we argue that deep neural networks incorporate both perspectives and, furthermore, that leveraging this aspect of deep learning may help in solving complex problems within language technology, such as modelling language and perception in the domain of spatial cognition.