CLJan 27, 2023
Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?Jakub Hościłowicz, Marcin Sowański, Piotr Czubowski et al.
In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.
LGMar 20
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competitionKrzysztof Kotowski, Ramez Shendy, Jakub Nalepa et al.
Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting models introduces a new security risk of trojan horse attacks, carried out by hiding a backdoor in the training data or directly in the model weights. Once implanted, the backdoor is activated by a specific trigger pattern at test time, causing the model to produce manipulated predictions. We focus on this issue in our \textit{Trojan Horse Hunt} data science competition, where more than 200 teams faced the task of identifying triggers hidden in deep forecasting models for spacecraft telemetry. We describe the novel task formulation, benchmark set, evaluation protocol, and best solutions from the competition. We further summarize key insights and research directions for effective identification of triggers in time series forecasting models. All materials are publicly available on the official competition webpage https://www.kaggle.com/competitions/trojan-horse-hunt-in-space.
CLNov 25, 2025Code
Adversarial Confusion Attack: Disrupting Multimodal Large Language ModelsJakub Hoscilowicz, Artur Janicki
We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Practical applications include embedding such adversarial images into websites to prevent MLLM-powered AI Agents from operating reliably. The proposed attack maximizes next-token entropy using a small ensemble of open-source MLLMs. In the white-box setting, we show that a single adversarial image can disrupt all models in the ensemble, both in the full-image and Adversarial CAPTCHA settings. Despite relying on a basic adversarial technique (PGD), the attack generates perturbations that transfer to both unseen open-source (e.g., Qwen3-VL) and proprietary (e.g., GPT-5.1) models.
CLMar 27, 2024
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessJakub Hoscilowicz, Adam Wiacek, Jan Chojnacki et al.
In this work, we explore LLM's internal representation space to identify attention heads that contain the most truthful and accurate information. We further developed the Inference Time Intervention (ITI) framework, which lets bias LLM without the need for fine-tuning. The improvement manifests in introducing a non-linear multi-token probing and multi-token intervention: Non-Linear ITI (NL-ITI), which significantly enhances performance on evaluation benchmarks. NL-ITI is tested on diverse multiple-choice datasets, including TruthfulQA, on which we report over 16% relative MC1 (accuracy of model pointing to the correct answer) improvement with respect to the baseline ITI results. Moreover, we achieved a 10% relative improvement over the recently released Truth Forest (TrFf) method that also focused on ITI improvement.
CLApr 3, 2024
Large Language Models for Expansion of Spoken Language Understanding Systems to New LanguagesJakub Hoscilowicz, Pawel Pawlowski, Marcin Skorupa et al.
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model. Specifically, we saw an improvement in the Overall Accuracy metric: from 53% to 62.18%, compared to the existing state-of-the-art method, Fine and Coarse-grained Multi-Task Learning Framework (FC-MTLF). In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5.31% to 22.06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method. Contrary to both FC-MTLF and GL-CLeF, our LLM-based machine translation does not require changes in the production architecture of SLU. Additionally, our pipeline is slot-type independent: it does not require any slot definitions or examples.
LGJul 17, 2025
Fake or Real: The Impostor Hunt in Texts for Space OperationsAgata Kaczmarek, Dawid Płudowski, Piotr Wilczyński et al.
The "Fake or Real" competition hosted on Kaggle (https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt ) is the second part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/ ). The competition idea is based on two real-life AI security threats identified within the project -- data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem's statement.
LGJun 2, 2025
Trojan Horse Hunt in Time Series Forecasting for Space OperationsKrzysztof Kotowski, Ramez Shendy, Jakub Nalepa et al.
This competition hosted on Kaggle (https://www.kaggle.com/competitions/trojan-horse-hunt-in-space) is the first part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/). The competition idea is based on one of the real-life AI security threats identified within the project -- the adversarial poisoning of continuously fine-tuned satellite telemetry forecasting models. The task is to develop methods for finding and reconstructing triggers (trojans) in advanced models for satellite telemetry forecasting used in safety-critical space operations. Participants are provided with 1) a large public dataset of real-life multivariate satellite telemetry (without triggers), 2) a reference model trained on the clean data, 3) a set of poisoned neural hierarchical interpolation (N-HiTS) models for time series forecasting trained on the dataset with injected triggers, and 4) Jupyter notebook with the training pipeline and baseline algorithm (the latter will be published in the last month of the competition). The main task of the competition is to reconstruct a set of 45 triggers (i.e., short multivariate time series segments) injected into the training data of the corresponding set of 45 poisoned models. The exact characteristics (i.e., shape, amplitude, and duration) of these triggers must be identified by participants. The popular Neural Cleanse method is adopted as a baseline, but it is not designed for time series analysis and new approaches are necessary for the task. The impact of the competition is not limited to the space domain, but also to many other safety-critical applications of advanced time series analysis where model poisoning may lead to serious consequences.
CYJun 29, 2024
Real-Time Energy Measurement for Non-Intrusive Well-Being Monitoring of Elderly People -- a Case StudyMateusz Brzozowski, Artur Janicki
This article presents a case study demonstrating a non-intrusive method for the well-being monitoring of elderly people. It is based on our real-time energy measurement system, which uses tiny beacons attached to electricity meters. Four participants aged 67-82 years took part in our study. We observed their electric power consumption for approx. a month, and then we analyzed them, taking into account the participants' notes on their activities. We created typical daily usage profiles for each participant and used anomaly detection to find unusual energy consumption. We found out that real-time energy measurement can give significant insight into someone's daily activities and, consequently, bring invaluable information to caregivers about the well-being of an elderly person, while being discreet and entirely non-intrusive.
CLJun 4, 2024
Large Language Models as Carriers of Hidden MessagesJakub Hoscilowicz, Pawel Popiolek, Jan Rudkowski et al.
Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query. Applications include LLM fingerprinting, where a unique identifier is embedded to verify licensing compliance, and steganography, where the LLM carries hidden messages disclosed through a trigger query. Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction through analysis of the LLM's output decoding process. We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates. We also present Unconditional Token Forcing Confusion (UTFC), a defense paradigm that makes hidden text resistant to all known extraction attacks without degrading the general performance of LLMs compared to standard fine-tuning. UTFC has both benign (improving LLM fingerprinting) and malign applications (using LLMs to create covert communication channels).
CRAug 25, 2016
YouSkyde: Information Hiding for Skype Video TrafficWojciech Mazurczyk, Maciej Karas, Krzysztof Szczypiorski et al.
In this paper a new information hiding method for Skype videoconference calls - YouSkyde - is introduced. A Skype traffic analysis revealed that introducing intentional losses into the Skype video traffic stream to provide the means for clandestine communication is the most favourable solution. A YouSkyde proof-of-concept implementation was carried out and its experimental evaluation is presented. The results obtained prove that the proposed method is feasible and offer a steganographic bandwidth as high as 0.93 kbps, while introducing negligible distortions into transmission quality and providing high undetectability.
MMAug 20, 2015
"The Good, The Bad And The Ugly": Evaluation of Wi-Fi SteganographyKrzysztof Szczypiorski, Artur Janicki, Steffen Wendzel
In this paper we propose a new method for the evaluation of network steganography algorithms based on the new concept of "the moving observer". We considered three levels of undetectability named: "good", "bad", and "ugly". To illustrate this method we chose Wi-Fi steganography as a solid family of information hiding protocols. We present the state of the art in this area covering well-known hiding techniques for 802.11 networks. "The moving observer" approach could help not only in the evaluation of steganographic algorithms, but also might be a starting point for a new detection system of network steganography. The concept of a new detection system, called MoveSteg, is explained in detail.
CROct 22, 2012
Steganalysis of Transcoding SteganographyArtur Janicki, Wojciech Mazurczyk, Krzysztof Szczypiorski
TranSteg (Trancoding Steganography) is a fairly new IP telephony steganographic method that functions by compressing overt (voice) data to make space for the steganogram by means of transcoding. It offers high steganographic bandwidth, retains good voice quality and is generally harder to detect than other existing VoIP steganographic methods. In TranSteg, after the steganogram reaches the receiver, the hidden information is extracted and the speech data is practically restored to what was originally sent. This is a huge advantage compared with other existing VoIP steganographic methods, where the hidden data can be extracted and removed but the original data cannot be restored because it was previously erased due to a hidden data insertion process. In this paper we address the issue of steganalysis of TranSteg. Various TranSteg scenarios and possibilities of warden(s) localization are analyzed with regards to the TranSteg detection. A steganalysis method based on MFCC (Mel-Frequency Cepstral Coefficients) parameters and GMMs (Gaussian Mixture Models) was developed and tested for various overt/covert codec pairs in a single warden scenario with double transcoding. The proposed method allowed for efficient detection of some codec pairs (e.g., G.711/G.729), whilst some others remained more resistant to detection (e.g., iLBC/AMR).
CRJan 30, 2012
Influence of Speech Codecs Selection on Transcoding SteganographyArtur Janicki, Wojciech Mazurczyk, Krzysztof Szczypiorski
The typical approach to steganography is to compress the covert data in order to limit its size, which is reasonable in the context of a limited steganographic bandwidth. TranSteg (Trancoding Steganography) is a new IP telephony steganographic method that was recently proposed that offers high steganographic bandwidth while retaining good voice quality. In TranSteg, compression of the overt data is used to make space for the steganogram. In this paper we focus on analyzing the influence of the selection of speech codecs on hidden transmission performance, that is, which codecs would be the most advantageous ones for TranSteg. Therefore, by considering the codecs which are currently most popular for IP telephony we aim to find out which codecs should be chosen for transcoding to minimize the negative influence on voice quality while maximizing the obtained steganographic bandwidth.