Valeriu Vrabie

CL
h-index14
3papers
10citations
Novelty48%
AI Score44

3 Papers

61.5CLApr 18
Beyond Black-Box Labels: Interpretable Criteria for Diagnosing SubjectiveNLP Tasks

Nisrine Rair, Alban Goupil, Valeriu Vrabie et al.

Subjective NLP datasets typically aggregate annotator judgments into a single gold label, making it difficult to diagnose whether disagreement reflects unclear criteria, collapsed distinctions, or legitimate plurality. We propose a \emph{schema-level diagnostic} for auditing expert-designed annotation schemas \emph{prior to} gold-label commitment, using only multi-annotator criterion judgments. The diagnostic separates two failure modes: unstable criteria with hard-to-operationalize boundaries, and systematic overlap that blurs the boundaries between mutually exclusive categories. Applied to persuasive value extraction in commercial documents, we find that disagreement is not diffuse: instability concentrates in a few criteria, while nearly half of covered sentences activate multiple categories. These signals align with where domain experts disagree, yielding an evidence-based audit for tightening guidelines, revising category structure, or reconsidering the annotation paradigm.

CLOct 20, 2025
When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

Nisrine Rair, Alban Goupil, Valeriu Vrabie et al.

Language models are often evaluated with scalar metrics like accuracy, but such measures fail to capture how models internally represent ambiguity, especially when human annotators disagree. We propose a topological perspective to analyze how fine-tuned models encode ambiguity and more generally instances. Applied to RoBERTa-Large on the MD-Offense dataset, Mapper, a tool from topological data analysis, reveals that fine-tuning restructures embedding space into modular, non-convex regions aligned with model predictions, even for highly ambiguous cases. Over $98\%$ of connected components exhibit $\geq 90\%$ prediction purity, yet alignment with ground-truth labels drops in ambiguous data, surfacing a hidden tension between structural confidence and label uncertainty. Unlike traditional tools such as PCA or UMAP, Mapper captures this geometry directly uncovering decision regions, boundary collapses, and overconfident clusters. Our findings position Mapper as a powerful diagnostic tool for understanding how models resolve ambiguity. Beyond visualization, it also enables topological metrics that may inform proactive modeling strategies in subjective NLP tasks.

CVAug 25, 2015
Wavelet subspace decomposition of thermal infrared images for defect detection in artworks

Muhammad Zubair Ahmad, Amir Ali Khan, Sihem Mezghani et al.

Monitoring the health of ancient artworks requires adequate prudence because of the sensitive nature of these materials. Classical techniques for identifying the development of faults rely on acoustic testing. These techniques, being invasive, may result in causing permanent damage to the material, especially if the material is inspected periodically. Non destructive testing has been carried out for different materials since long. In this regard, non-invasive systems were developed based on infrared thermometry principle to identify the faults in artworks. The test artwork is heated and the thermal response of the different layers is captured with the help of a thermal infrared camera. However, prolonged heating risks overheating and thus causing damage to artworks and an alternate approach is to use pseudo-random binary sequence excitations. The faults in the artwork, though, cannot be detected on the captured images, especially if their strength is weak. The weaker faults are either masked by the stronger ones, by the pictorial layer of the artwork or by the non-uniform heating. This work addresses the detection and localization of the faults through a wavelet based subspace decomposition scheme. The proposed scheme, on one hand, allows to remove the background while, on the other hand, removes the undesired high frequency noise. It is shown that the detection parameter is proportional to the diameter and the depth of the fault. A criterion is proposed to select the optimal wavelet basis along with suitable level selection for wavelet decomposition and reconstruction. The proposed approach is tested on a laboratory developed test sample with known fault locations and dimensions as well as real artworks. A comparison with a previously reported method demonstrates the efficacy of the proposed approach for fault detection in artworks.