SIAug 22, 2023
User Identity Linkage in Social Media Using Linguistic and Social Interaction FeaturesDespoina Chatzakou, Juan Soler-Company, Theodora Tsikrika et al.
Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users' online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.
NCSep 5, 2023
Neural CrystalsSofia Karamintziou, Thanassis Mavropoulos, Dimos Ntioudis et al.
We face up to the challenge of explainability in Multimodal Artificial Intelligence (MMAI). At the nexus of neuroscience-inspired and quantum computing, interpretable and transparent spin-geometrical neural architectures for early fusion of large-scale, heterogeneous, graph-structured data are envisioned, harnessing recent evidence for relativistic quantum neural coding of (co-)behavioral states in the self-organizing brain, under competitive, multidimensional dynamics. The designs draw on a self-dual classical description - via special Clifford-Lipschitz operations - of spinorial quantum states within registers of at most 16 qubits for efficient encoding of exponentially large neural structures. Formally 'trained', Lorentz neural architectures with precisely one lateral layer of exclusively inhibitory interneurons accounting for anti-modalities, as well as their co-architectures with intra-layer connections are highlighted. The approach accommodates the fusion of up to 16 time-invariant interconnected (anti-)modalities and the crystallization of latent multidimensional patterns. Comprehensive insights are expected to be gained through applications to Multimodal Big Data, under diverse real-world scenarios.
IRMar 7
Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic AnalysisMaria-Eirini Pegia, Dimitrios Stefanopoulos, Björn Þór Jónsson et al.
Text-to-video retrieval enables users to find relevant video content using natural language queries, a task that has grown increasingly important with the rapid expansion of online video. Over the past six years, research has produced numerous methods, such as dual encoders, attention-driven models, and multimodal fusion approaches; however, fundamental questions remain about model behavior, dataset influence, and query difficulty. In this work, we evaluate 14 state-of-the-art retrieval methods across 3 widely used datasets under a unified preprocessing and evaluation framework. We analyze caption characteristics, including length, clarity, semantic category, and Action vs. Scene balance, and link these to model performance. Our results show that short, clear, and simple captions, such as those describing single actions or color attributes, achieve higher recall, while complex events, multi-step activities, or fine-grained scene descriptions remain challenging for all existing models. Attention-driven architectures better handle temporally dependent or multi-step queries, whereas dual-encoder and multimodal fusion models perform well primarily on simpler or single-category captions. Cross-dataset generalization improves with larger, more diverse caption sets, but generative captions do not consistently enhance retrieval accuracy. Overall, our findings highlight key dataset factors, benchmark challenges, and the interplay between query content and model architecture, providing guidance for developing more effective text-to-video retrieval systems.
LGDec 16, 2024Code
RAG Playground: A Framework for Systematic Evaluation of Retrieval Strategies and Prompt Engineering in RAG SystemsIoannis Papadimitriou, Ilias Gialampoukidis, Stefanos Vrochidis et al.
We present RAG Playground, an open-source framework for systematic evaluation of Retrieval-Augmented Generation (RAG) systems. The framework implements and compares three retrieval approaches: naive vector search, reranking, and hybrid vector-keyword search, combined with ReAct agents using different prompting strategies. We introduce a comprehensive evaluation framework with novel metrics and provide empirical results comparing different language models (Llama 3.1 and Qwen 2.5) across various retrieval configurations. Our experiments demonstrate significant performance improvements through hybrid search methods and structured self-evaluation prompting, achieving up to 72.7% pass rate on our multi-metric evaluation framework. The results also highlight the importance of prompt engineering in RAG systems, with our custom-prompted agents showing consistent improvements in retrieval accuracy and response quality.
12.2CRMay 12
ACTING: A Platform for Cyber Ranges FederationKyriakos Christou, Maria Michalopoulou, Stefano Taggi et al.
Cyber Defence (CD) training requires interoperable cyber-range environments capable of supporting complex, multidomain exercises across distributed infrastructures. This paper presents three main contributions addressing this challenge. First, we introduce the Exercise Description Language - First Generation (EDL-FG), a structured language for formally describing cyber-range training services and exercises. EDL-FG captures both the technical infrastructure required to emulate ICT/OT environments and the scenario logic governing cyber events, injects, and participant interactions, enabling interoperable and automated scenario deployment across federated Cyber Ranges (CRs). Second, the ACTING platform introduces automated PE and scoring mechanisms that assess trainee actions during exercises through coordinated data collection and analysis across participating CRs. Third, the platform enables multi-domain cyber training scenarios that combine civilian and military operational contexts. Building upon federation capabilities established under the H2020 ECHO project, ACTING demonstrates how interoperable scenario description and automated evaluation support scalable and realistic CD training.
CLOct 14, 2024
A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek TweetsNikolaos Mylonas, Nikolaos Stylianou, Theodora Tsikrika et al.
Interpretability is a topic that has been in the spotlight for the past few years. Most existing interpretability techniques produce interpretations in the form of rules or feature importance. These interpretations, while informative, may be harder to understand for non-expert users and therefore, cannot always be considered as adequate explanations. To that end, explanations in natural language are often preferred, as they are easier to comprehend and also more presentable to end-users. This work introduces an early concept for a novel pipeline that can be used in text classification tasks, offering predictions and explanations in natural language. It comprises of two models: a classifier for labelling the text and an explanation generator which provides the explanation. The proposed pipeline can be adopted by any text classification task, given that ground truth rationales are available to train the explanation generator. Our experiments are centred around the tasks of sentiment analysis and offensive language identification in Greek tweets, using a Greek Large Language Model (LLM) to obtain the necessary explanations that can act as rationales. The experimental evaluation was performed through a user study based on three different metrics and achieved promising results for both datasets.
LGOct 8, 2025
Utilizing Large Language Models for Machine Learning ExplainabilityAlexandros Vassiliades, Nikolaos Polatidis, Stamatios Samaras et al.
This study explores the explainability capabilities of large language models (LLMs), when employed to autonomously generate machine learning (ML) solutions. We examine two classification tasks: (i) a binary classification problem focused on predicting driver alertness states, and (ii) a multilabel classification problem based on the yeast dataset. Three state-of-the-art LLMs (i.e. OpenAI GPT, Anthropic Claude, and DeepSeek) are prompted to design training pipelines for four common classifiers: Random Forest, XGBoost, Multilayer Perceptron, and Long Short-Term Memory networks. The generated models are evaluated in terms of predictive performance (recall, precision, and F1-score) and explainability using SHAP (SHapley Additive exPlanations). Specifically, we measure Average SHAP Fidelity (Mean Squared Error between SHAP approximations and model outputs) and Average SHAP Sparsity (number of features deemed influential). The results reveal that LLMs are capable of producing effective and interpretable models, achieving high fidelity and consistent sparsity, highlighting their potential as automated tools for interpretable ML pipeline generation. The results show that LLMs can produce effective, interpretable pipelines with high fidelity and consistent sparsity, closely matching manually engineered baselines.
SEJul 4, 2020
Towards Semantic Detection of Smells in Cloud Infrastructure CodeIndika Kumara, Zoe Vasileiou, Georgios Meditskos et al.
Automated deployment and management of Cloud applications relies on descriptions of their deployment topologies, often referred to as Infrastructure Code. As the complexity of applications and their deployment models increases, developers inadvertently introduce software smells to such code specifications, for instance, violations of good coding practices, modular structure, and more. This paper presents a knowledge-driven approach enabling developers to identify the aforementioned smells in deployment descriptions. We detect smells with SPARQL-based rules over pattern-based OWL 2 knowledge graphs capturing deployment models. We show the feasibility of our approach with a prototype and three case studies.