CLJun 15, 2023
Participatory Research as a Path to Community-Informed, Gender-Fair Machine TranslationDagmar Gromann, Manuel Lardelli, Katta Spiel et al.
Recent years have seen a strongly increased visibility of non-binary people in public discourse. Accordingly, considerations of gender-fair language go beyond a binary conception of male/female. However, language technology, especially machine translation (MT), still suffers from binary gender bias. Proposing a solution for gender-fair MT beyond the binary from a purely technological perspective might fall short to accommodate different target user groups and in the worst case might lead to misgendering. To address this challenge, we propose a method and case study building on participatory action research to include experiential experts, i.e., queer and non-binary people, translators, and MT experts, in the MT design process. The case study focuses on German, where central findings are the importance of context dependency to avoid identity invalidation and a desire for customizable MT solutions.
CLSep 6, 2022
"Es geht um Respekt, nicht um Technologie": Erkenntnisse aus einem Interessensgruppen-übergreifenden Workshop zu genderfairer Sprache und SprachtechnologieSabrina Burtscher, Katta Spiel, Lukas Daniel Klausner et al.
With the increasing attention non-binary people receive in Western societies, strategies of gender-fair language have started to move away from binary (only female/male) concepts of gender. Nevertheless, hardly any approaches to take these identities into account into machine translation models exist so far. A lack of understanding of the socio-technical implications of such technologies risks further reproducing linguistic mechanisms of oppression and mislabelling. In this paper, we describe the methods and results of a workshop on gender-fair language and language technologies, which was led and organised by ten researchers from TU Wien, St. Pölten UAS, FH Campus Wien and the University of Vienna and took place in Vienna in autumn 2021. A wide range of interest groups and their representatives were invited to ensure that the topic could be dealt with holistically. Accordingly, we aimed to include translators, machine translation experts and non-binary individuals (as "community experts") on an equal footing. Our analysis shows that gender in machine translation requires a high degree of context sensitivity, that developers of such technologies need to position themselves cautiously in a process still under social negotiation, and that flexible approaches seem most adequate at present. We then illustrate steps that follow from our results for the field of gender-fair language technologies so that technological developments can adequately line up with social advancements. ---- Mit zunehmender gesamtgesellschaftlicher Wahrnehmung nicht-binärer Personen haben sich in den letzten Jahren auch Konzepte von genderfairer Sprache von der bisher verwendeten Binarität (weiblich/männlich) entfernt. Trotzdem gibt es bislang nur wenige Ansätze dazu, diese Identitäten in maschineller Übersetzung abzubilden. Ein fehlendes Verständnis unterschiedlicher sozio-technischer Implikationen derartiger Technologien birgt in sich die Gefahr, fehlerhafte Ansprachen und Bezeichnungen sowie sprachliche Unterdrückungsmechanismen zu reproduzieren. In diesem Beitrag beschreiben wir die Methoden und Ergebnisse eines Workshops zu genderfairer Sprache in technologischen Zusammenhängen, der im Herbst 2021 in Wien stattgefunden hat. Zehn Forscher*innen der TU Wien, FH St. Pölten, FH Campus Wien und Universität Wien organisierten und leiteten den Workshop. Dabei wurden unterschiedlichste Interessensgruppen und deren Vertreter*innen breit gestreut eingeladen, um sicherzustellen, dass das Thema holistisch behandelt werden kann. Dementsprechend setzten wir uns zum Ziel, Machine-Translation-Entwickler*innen, Übersetzer*innen, und nicht-binäre Privatpersonen (als "Lebenswelt-Expert*innen") gleichberechtigt einzubinden. Unsere Analyse zeigt, dass Geschlecht in maschineller Übersetzung eine maßgeblich kontextsensible Herangehensweise erfordert, die Entwicklung von Sprachtechnologien sich vorsichtig in einem sich noch in Aushandlung befindlichen gesellschaftlichen Prozess positionieren muss, und flexible Ansätze derzeit am adäquatesten erscheinen. Wir zeigen auf, welche nächsten Schritte im Bereich genderfairer Technologien notwendig sind, damit technische mit sozialen Entwicklungen mithalten können.
CYJul 21, 2022
Wer ist schuld, wenn Algorithmen irren? Entscheidungsautomatisierung, Organisationen und VerantwortungAngelika Adensamer, Rita Gsenger, Lukas Daniel Klausner
Algorithmic decision support (ADS) is increasingly used in a whole array of different contexts and structures in various areas of society, influencing many people's lives. Its use raises questions, among others, about accountability, transparency and responsibility. Our article aims to give a brief overview of the central issues connected to ADS, responsibility and decision-making in organisational contexts and identify open questions and research gaps. Furthermore, we describe a set of guidelines and a complementary digital tool to assist practitioners in mapping responsibility when introducing ADS within their organisational context. -- Algorithmenunterstützte Entscheidungsfindung (algorithmic decision support, ADS) kommt in verschiedenen Kontexten und Strukturen vermehrt zum Einsatz und beeinflusst in diversen gesellschaftlichen Bereichen das Leben vieler Menschen. Ihr Einsatz wirft einige Fragen auf, unter anderem zu den Themen Rechenschaft, Transparenz und Verantwortung. Im Folgenden möchten wir einen Überblick über die wichtigsten Fragestellungen rund um ADS, Verantwortung und Entscheidungsfindung in organisationalen Kontexten geben und einige offene Fragen und Forschungslücken aufzeigen. Weiters beschreiben wir als konkrete Hilfestellung für die Praxis einen von uns entwickelten Leitfaden samt ergänzendem digitalem Tool, welches Anwender:innen insbesondere bei der Verortung und Zuordnung von Verantwortung bei der Nutzung von ADS in organisationalen Kontexten helfen soll.
LGNov 17, 2023
Delete My Account: Impact of Data Deletion on Machine Learning ClassifiersTobias Dam, Maximilian Henzl, Lukas Daniel Klausner
Users are more aware than ever of the importance of their own data, thanks to reports about security breaches and leaks of private, often sensitive data in recent years. Additionally, the GDPR has been in effect in the European Union for over three years and many people have encountered its effects in one way or another. Consequently, more and more users are actively protecting their personal data. One way to do this is to make of the right to erasure guaranteed in the GDPR, which has potential implications for a number of different fields, such as big data and machine learning. Our paper presents an in-depth analysis about the impact of the use of the right to erasure on the performance of machine learning models on classification tasks. We conduct various experiments utilising different datasets as well as different machine learning algorithms to analyse a variety of deletion behaviour scenarios. Due to the lack of credible data on actual user behaviour, we make reasonable assumptions for various deletion modes and biases and provide insight into the effects of different plausible scenarios for right to erasure usage on data quality of machine learning. Our results show that the impact depends strongly on the amount of data deleted, the particular characteristics of the dataset and the bias chosen for deletion and assumptions on user behaviour.
79.2CYApr 14
AI of the People, by the People, for the People: A Social Choice Approach to Collective Control of Artificial IntelligencePaul Anton Bachmann, Niclas Boehmer, Lukas Daniel Klausner et al.
With the growing adoption of AI systems, reasoning about how society can exert control over AI becomes an increasingly urgent problem. Existing work on democratic control largely focuses on macro-level governance. In contrast, we propose a new approach grounded in social choice theory, which we term collective control of artificial intelligence. We argue that collective input can and should be incorporated at multiple points across the ML development pipeline, from data collection through objective design to alignment. We further demonstrate that social choice provides a well-suited modelling language for the treatment of collective input across all stages and that its axiomatic methodology yields principled criteria for evaluating various control mechanisms. Overall, our conceptual contribution provides a mathematically grounded framework to implement and analyse collective control of AI systems.
CYMay 19, 2023
"Schöne neue Lieferkettenwelt": Workers' Voice und Arbeitsstandards in Zeiten algorithmischer VorhersageLukas Daniel Klausner, Maximilian Heimstädt, Leonhard Dobusch
The complexity and increasingly tight coupling of supply chains poses a major logistical challenge for leading companies. Another challenge is that leading companies -- under pressure from consumers, a critical public and legislative measures such as supply chain laws -- have to take more responsibility than before for their suppliers' labour standards. In this paper, we discuss a new approach that leading companies are using to try to address these challenges: algorithmic prediction of business risks, but also environmental and social risks. We describe the technical and cultural conditions for algorithmic prediction and explain how -- from the perspective of leading companies -- it helps to address both challenges. We then develop scenarios on how and with what kind of social consequences algorithmic prediction can be used by leading companies. From the scenarios, we derive policy options for different stakeholder groups to help develop algorithmic prediction towards improving labour standards and worker voice. -- Die Komplexität und zunehmend enge Kopplung vieler Lieferketten stellt eine große logistische Herausforderung für Leitunternehmen dar. Eine weitere Herausforderung besteht darin, dass Leitunternehmen -- gedrängt durch Konsument:innen, eine kritische Öffentlichkeit und gesetzgeberische Maßnahmen wie die Lieferkettengesetze -- stärker als bisher Verantwortung für Arbeitsstandards in ihren Zulieferbetrieben übernehmen müssen. In diesem Beitrag diskutieren wir einen neuen Ansatz, mit dem Leitunternehmen versuchen, diese Herausforderungen zu bearbeiten: die algorithmische Vorhersage von betriebswirtschaftlichen, aber auch ökologischen und sozialen Risiken. Wir beschreiben die technischen und kulturellen Bedingungen für algorithmische Vorhersage und erklären, wie diese -- aus Perspektive von Leitunternehmen -- bei der Bearbeitung beider Herausforderungen hilft. Anschließend entwickeln wir Szenarien, wie und mit welchen sozialen Konsequenzen algorithmische Vorhersage durch Leitunternehmen eingesetzt werden kann. Aus den Szenarien leiten wir Handlungsoptionen für verschiedene Stakeholder-Gruppen ab, die dabei helfen sollen, algorithmische Vorhersage im Sinne einer Verbesserung von Arbeitsstandards und Workers' Voice weiterzuentwickeln.
LGSep 18, 2021
Towards Resilient Artificial Intelligence: Survey and Research IssuesOliver Eigner, Sebastian Eresheim, Peter Kieseberg et al.
Artificial intelligence (AI) systems are becoming critical components of today's IT landscapes. Their resilience against attacks and other environmental influences needs to be ensured just like for other IT assets. Considering the particular nature of AI, and machine learning (ML) in particular, this paper provides an overview of the emerging field of resilient AI and presents research issues the authors identify as potential future work.
CYJun 24, 2021
"Part Man, Part Machine, All Cop": Automation in PolicingAngelika Adensamer, Lukas Daniel Klausner
Digitisation, automation and datafication permeate policing and justice more and more each year -- from predictive policing methods through recidivism prediction to automated biometric identification at the border. The sociotechnical issues surrounding the use of such systems raise questions and reveal problems, both old and new. Our article reviews contemporary issues surrounding automation in policing and the legal system, finds common issues and themes in various different examples, introduces the distinction between human "retail bias" and algorithmic "wholesale bias", and argues for shifting the viewpoint on the debate to focus on both workers' rights and organisational responsibility as well as fundamental rights and the right to an effective remedy.
LGFeb 9, 2021
$k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning ClassifiersDjordje Slijepčević, Maximilian Henzl, Lukas Daniel Klausner et al.
The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, e. g. in the case of collaborative research endeavours. For use with anonymisation techniques, the $k$-anonymity criterion is one of the most popular, with numerous scientific publications on different algorithms and metrics. Anonymisation techniques often require changing the data and thus necessarily affect the results of machine learning models trained on the underlying data. In this work, we conduct a systematic comparison and detailed investigation into the effects of different $k$-anonymisation algorithms on the results of machine learning models. We investigate a set of popular $k$-anonymisation algorithms with different classifiers and evaluate them on different real-world datasets. Our systematic evaluation shows that with an increasingly strong $k$-anonymity constraint, the classification performance generally degrades, but to varying degrees and strongly depending on the dataset and anonymisation method. Furthermore, Mondrian can be considered as the method with the most appealing properties for subsequent classification.
LGJan 13, 2021
Anomaly Detection Support Using Process ClassificationSebastian Eresheim, Lukas Daniel Klausner, Patrick Kochberger
Anomaly detection systems need to consider a lot of information when scanning for anomalies. One example is the context of the process in which an anomaly might occur, because anomalies for one process might not be anomalies for a different one. Therefore data -- such as system events -- need to be assigned to the program they originate from. This paper investigates whether it is possible to infer from a list of system events the program whose behavior caused the occurrence of these system events. To that end, we model transition probabilities between non-equivalent events and apply the $k$-nearest neighbors algorithm. This system is evaluated on non-malicious, real-world data using four different evaluation scores. Our results suggest that the approach proposed in this paper is capable of correctly inferring program names from system events.
CRApr 2, 2020
Typosquatting for Fun and Profit: Cross-Country Analysis of Pop-Up ScamTobias Dam, Lukas Daniel Klausner, Sebastian Schrittwieser
Today, many different types of scams can be found on the internet. Online criminals are always finding new creative ways to trick internet users, be it in the form of lottery scams, downloading scam apps for smartphones or fake gambling websites. This paper presents a large-scale study on one particular delivery method of online scam: pop-up scam on typosquatting domains. Typosquatting describes the concept of registering domains which are very similar to existing ones while deliberately containing common typing errors; these domains are then used to trick online users while under the belief of browsing the intended website. Pop-up scam uses JavaScript alert boxes to present a message which attracts the user's attention very effectively, as they are a blocking user interface element. Our study among typosquatting domains derived from the Majestic Million list utilising an Austrian IP address revealed on 1219 distinct typosquatting URLs a total of 2577 pop-up messages, out of which 1538 were malicious. Approximately a third of those distinct URLs (403) were targeted and displayed pop-up messages to one specific HTTP user agent only. Based on our scans, we present an in-depth analysis as well as a detailed classification of different targeting parameters (user agent and language) which triggered varying kinds of pop-up scams. Furthermore, we expound the differences of current pop-up scam characteristics in comparison with a previous scan performed in late 2018 and examine the use of IDN homograph attacks as well as the application of message localisation using additional scans with IP addresses from the United States and Japan.
CYJul 1, 2019
Ich weiß, was du nächsten Sommer getan haben wirst: Predictive Policing in ÖsterreichAngelika Adensamer, Lukas Daniel Klausner
Predictive policing is a data-based, predictive analytical technique used in law enforcement. In this paper, we give an overview of the current situation in Austria and discuss technical, sociopolitical and legal questions raised by the use of PP, such as the lack of awareness of discriminatory structures in society, the biases in data underlying PP and the lack of reflection on the basic premises and feedback mechanisms of PP. Violations of fundamental rights without cause are not allowed by the Austrian Code of Criminal Procedure (Strafprozeßordnung, StPO), the Security Police Act (Sicherheitspolizeigesetz, SPG) or the Act concerning Police Protection of the State (Polizeiliches Staatsschutzgesetz, PStSG); the principle of allowing police intervention only on the basis of concrete threats or suspicion must remain absolute. Considering the numerous problems (not least from the point of view of legal policy), we conclude that the use of PP should be eschewed and that resources and planning should instead be focussed on solving the social problems which actually cause crime. ----- Predictive Policing ist ein datenbasiertes und prognosegetriebenes Modell für Polizeiarbeit. Wir geben in diesem Artikel einen Überblick über den aktuellen Stand in Österreich und diskutieren technische, politisch-gesellschaftliche und rechtliche Probleme, die sich daraus ergeben -- etwa das mangelhafte Bewusstsein für Prozesse gesellschaftlicher Diskriminierung, die verzerrte Datenbasis, die PP zugrundeliegt, und fehlende Reflexion über zugrundeliegende Annahmen und Rückkopplungseffekte. Anlasslose Grundrechtseingriffe sind weder durch die StPO noch das SPG oder das PStSG gedeckt; dem Grundgedanken, dass Polizei erst bei konkreter Gefahrenlage oder Tatverdacht tätig werden darf, muss weiterhin Rechnung getragen werden. Aus unserer Sicht sollte angesichts der zahlreichen Probleme (und auch aus rechtspolitischen Erwägungen) auf PP verzichtet werden und stattdessen Ressourcen und Überlegung in die Lösung jener gesellschaftlicher Probleme investiert werden, die zu Kriminalität führen.
CRJun 25, 2019
Large-Scale Analysis of Pop-Up Scam on Typosquatting URLsTobias Dam, Lukas Daniel Klausner, Damjan Buhov et al.
Today, many different types of scams can be found on the internet. Online criminals are always finding new creative ways to trick internet users, be it in the form of lottery scams, downloading scam apps for smartphones or fake gambling websites. This paper presents a large-scale study on one particular delivery method of online scam: pop-up scam on typosquatting domains. Typosquatting describes the concept of registering domains which are very similar to existing ones while deliberately containing common typing errors; these domains are then used to trick online users while under the belief of browsing the intended website. Pop-up scam uses JavaScript alert boxes to present a message which attracts the user's attention very effectively, as they are a blocking user interface element. Our study among typosquatting domains derived from the Alexa Top 1 Million list revealed on 8255 distinct typosquatting URLs a total of 9857 pop-up messages, out of which 8828 were malicious. The vast majority of those distinct URLs (7176) were targeted and displayed pop-up messages to one specific HTTP user agent only. Based on our scans, we present an in-depth analysis as well as a detailed classification of different targeting parameters (user agent and language) which triggered varying kinds of pop-up scams.