Alhassan Abdelhalim

CL
h-index2
5papers
95citations
Novelty28%
AI Score37

5 Papers

CLJan 30
LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models

Alhassan Abdelhalim, Janick Edinger, Sören Laue et al.

Large Language Models (LLMs) are becoming increasingly popular in pervasive computing due to their versatility and strong performance. However, despite their ubiquitous use, the exact mechanisms underlying their outstanding performance remain unclear. Different methods for LLM explainability exist, and many are, as a method, not fully understood themselves. We started with the question of how linguistic abstraction emerges in LLMs, aiming to detect it across different LLM modules (attention heads and input embeddings). For this, we used methods well-established in the literature: (1) probing for token-level relational structures, and (2) feature-mapping using embeddings as carriers of human-interpretable properties. Both attempts failed for different methodological reasons: Attention-based explanations collapsed once we tested the core assumption that later-layer representations still correspond to tokens. Property-inference methods applied to embeddings also failed because their high predictive scores were driven by methodological artifacts and dataset structure rather than meaningful semantic knowledge. These failures matter because both techniques are widely treated as evidence for what LLMs supposedly understand, yet our results show such conclusions are unwarranted. These limitations are particularly relevant in pervasive and distributed computing settings where LLMs are deployed as system components and interpretability methods are relied upon for debugging, compression, and explaining models.

CLApr 24, 2024
Detecting Conceptual Abstraction in LLMs

Michaela Regneri, Alhassan Abdelhalim, Sören Laue

We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.

CVSep 22, 2025
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review

Emre Gülsoylu, Alhassan Abdelhalim, Derya Kara Boztas et al.

Background: The standardisation of Intermodal Loading Units (ILUs), including containers, semi-trailers, and swap bodies, has transformed global trade, yet efficient and robust identification remains an operational bottleneck in ports and terminals. Objective: To map Computer Vision (CV) methods for ILU identification, clarify terminology, summarise the evolution of proposed approaches, and highlight research gaps, future directions and their potential effects on terminal operations. Methods: Following PRISMA-ScR, we searched Google Scholar and dblp for English-language studies with quantitative results. After dual reviewer screening, the studies were charted across methods, datasets, and evaluation metrics. Results: 63 empirical studies on CV-based solutions for the ILU identification task, published between 1990 and 2025 were reviewed. Methodological evolution of ILU identification solutions, datasets, evaluation of the proposed methods and future research directions are summarised. A shift from static (e.g. OCR-gates) to vehicle mounted camera setups, which enables precise monitoring is observed. The reported results for end-to-end accuracy range from 5% to 96%. Conclusions: We propose standardised terminology, advocate for open-access datasets, codebases and model weights to enable fair evaluation and define future work directions. The shift from static to dynamic camera settings introduces new challenges that have transformative potential for transportation and logistics. However, the lack of public benchmark datasets, open-access code, and standardised terminology hinders the advancements in this field. As for the future work, we suggest addressing the new challenges emerged from vehicle mounted cameras, exploring synthetic data generation, refining the multi-stage methods into unified end-to-end models to reduce complexity, and focusing on contextless text recognition.

CLAug 19, 2025
Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings

Hanna Herasimchyk, Alhassan Abdelhalim, Sören Laue et al.

Understanding what knowledge is implicitly encoded in deep learning models is essential for improving the interpretability of AI systems. This paper examines common methods to explain the knowledge encoded in word embeddings, which are core elements of large language models (LLMs). These methods typically involve mapping embeddings onto collections of human-interpretable semantic features, known as feature norms. Prior work assumes that accurately predicting these semantic features from the word embeddings implies that the embeddings contain the corresponding knowledge. We challenge this assumption by demonstrating that prediction accuracy alone does not reliably indicate genuine feature-based interpretability. We show that these methods can successfully predict even random information, concluding that the results are predominantly determined by an algorithmic upper bound rather than meaningful semantic representation in the word embeddings. Consequently, comparisons between datasets based solely on prediction performance do not reliably indicate which dataset is better captured by the word embeddings. Our analysis illustrates that such mappings primarily reflect geometric similarity within vector spaces rather than indicating the genuine emergence of semantic properties.

CLMar 11, 2025
Automating Violence Detection and Categorization from Ancient Texts

Alhassan Abdelhalim, Michaela Regneri

Violence descriptions in literature offer valuable insights for a wide range of research in the humanities. For historians, depictions of violence are of special interest for analyzing the societal dynamics surrounding large wars and individual conflicts of influential people. Harvesting data for violence research manually is laborious and time-consuming. This study is the first one to evaluate the effectiveness of large language models (LLMs) in identifying violence in ancient texts and categorizing it across multiple dimensions. Our experiments identify LLMs as a valuable tool to scale up the accurate analysis of historical texts and show the effect of fine-tuning and data augmentation, yielding an F1-score of up to 0.93 for violence detection and 0.86 for fine-grained violence categorization.