Kyriaki Kalimeri

CL
h-index25
12papers
248citations
Novelty47%
AI Score45

12 Papers

CLAug 9, 2024Code
Quantitative Information Extraction from Humanitarian Documents

Daniele Liberatore, Kyriaki Kalimeri, Derya Sever et al.

Humanitarian action is accompanied by a mass of reports, summaries, news, and other documents. To guide its activities, important information must be quickly extracted from such free-text resources. Quantities, such as the number of people affected, amount of aid distributed, or the extent of infrastructure damage, are central to emergency response and anticipatory action. In this work, we contribute an annotated dataset for the humanitarian domain for the extraction of such quantitative information, along side its important context, including units it refers to, any modifiers, and the relevant event. Further, we develop a custom Natural Language Processing pipeline to extract the quantities alongside their units, and evaluate it in comparison to baseline and recent literature. The proposed model achieves a consistent improvement in the performance, especially in the documents pertaining to the Dominican Republic and select African countries. We make the dataset and code available to the research community to continue the improvement of NLP tools for the humanitarian domain.

CYMay 29
Context-Conditioned Generative Models Enable Subnational Refinement of Sparse Humanitarian Surveys

Federica Sibilla, Vasiliki Voukelatou, Duccio Piovani et al.

Data scarcity limits inference in many scientific and policy domains. Survey data are essential for decision-making, but sparse samples often fail to capture fine spatial granularities. We evaluate normalizing flows, a generative model that learns complex data distributions and can be conditioned on exogenous contextual features, in controlled data scarcity scenarios. Across eight household survey datasets spanning six low-income or middle-income countries in the humanitarian domain, we show that context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and that performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource allocation.

CLSep 14, 2022
LibertyMFD: A Lexicon to Assess the Moral Foundation of Liberty

Oscar Araque, Lorenzo Gatti, Kyriaki Kalimeri

Quantifying the moral narratives expressed in the user-generated text, news, or public discourses is fundamental for understanding individuals' concerns and viewpoints and preventing violent protests and social polarisation. The Moral Foundation Theory (MFT) was developed to operationalise morality in a five-dimensional scale system. Recent developments of the theory urged for the introduction of a new foundation, the Liberty Foundation. Being only recently added to the theory, there are no available linguistic resources to assess whether liberty is present in text corpora. Given its importance to current social issues such as the vaccination debate, we propose two data-driven approaches, deriving two candidate lexicons generated based on aligned documents from online news sources with different worldviews. After extensive experimentation, we contribute to the research community a novel lexicon that assesses the liberty moral foundation in the way individuals with contrasting viewpoints express themselves through written text. The LibertyMFD dictionary can be a valuable tool for policymakers to understand diverse viewpoints on controversial social issues such as vaccination, abortion, or even uprisings, as they happen and on a large scale.

CYSep 2, 2022
"More Than Words": Linking Music Preferences and Moral Values Through Lyrics

Vjosa Preniqi, Kyriaki Kalimeri, Charalampos Saitis

This study explores the association between music preferences and moral values by applying text analysis techniques to lyrics. Harvesting data from a Facebook-hosted application, we align psychometric scores of 1,386 users to lyrics from the top 5 songs of their preferred music artists as emerged from Facebook Page Likes. We extract a set of lyrical features related to each song's overarching narrative, moral valence, sentiment, and emotion. A machine learning framework was designed to exploit regression approaches and evaluate the predictive power of lyrical features for inferring moral values. Results suggest that lyrics from top songs of artists people like inform their morality. Virtues of hierarchy and tradition achieve higher prediction scores ($.20 \leq r \leq .30$) than values of empathy and equality ($.08 \leq r \leq .11$), while basic demographic variables only account for a small part in the models' explainability. This shows the importance of music listening behaviours, as assessed via lyrical preferences, alone in capturing moral values. We discuss the technological and musicological implications and possible future improvements.

CLSep 6, 2023
Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Enrico M. Belliardo, Kyriaki Kalimeri, Yelena Mejova

Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

CLJul 16, 2024
LML: A Novel Lexicon for the Moral Foundation of Liberty

Oscar Araque, Lorenzo Gatti, Sergio Consoli et al.

The moral value of liberty is a central concept in our inference system when it comes to taking a stance towards controversial social issues such as vaccine hesitancy, climate change, or the right to abortion. Here, we propose a novel Liberty lexicon evaluated on more than 3,000 manually annotated data both in in- and out-of-domain scenarios. As a result of this evaluation, we produce a combined lexicon that constitutes the main outcome of this work. This final lexicon incorporates information from an ensemble of lexicons that have been generated using word embedding similarity (WE) and compositional semantics (CS). Our key contributions include enriching the liberty annotations, developing a robust liberty lexicon for broader application, and revealing the complexity of expressions related to liberty across different platforms. Through the evaluation, we show that the difficulty of the task calls for designing approaches that combine knowledge, in an effort of improving the representations of learning systems.

CLDec 22, 2025
A Large-Language-Model Framework for Automated Humanitarian Situation Reporting

Ivan Decostanzi, Yelena Mejova, Kyriaki Kalimeri

Timely and accurate situational reports are essential for humanitarian decision-making, yet current workflows remain largely manual, resource intensive, and inconsistent. We present a fully automated framework that uses large language models (LLMs) to transform heterogeneous humanitarian documents into structured and evidence-grounded reports. The system integrates semantic text clustering, automatic question generation, retrieval augmented answer extraction with citations, multi-level summarization, and executive summary generation, supported by internal evaluation metrics that emulate expert reasoning. We evaluated the framework across 13 humanitarian events, including natural disasters and conflicts, using more than 1,100 documents from verified sources such as ReliefWeb. The generated questions achieved 84.7 percent relevance, 84.0 percent importance, and 76.4 percent urgency. The extracted answers reached 86.3 percent relevance, with citation precision and recall both exceeding 76 percent. Agreement between human and LLM based evaluations surpassed an F1 score of 0.80. Comparative analysis shows that the proposed framework produces reports that are more structured, interpretable, and actionable than existing baselines. By combining LLM reasoning with transparent citation linking and multi-level evaluation, this study demonstrates that generative AI can autonomously produce accurate, verifiable, and operationally useful humanitarian situation reports.

CLMar 12, 2024
MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions

Vjosa Preniqi, Iacopo Ghinassi, Julia Ive et al.

Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. Controversial topics, including vaccination, abortion, racism, and sexual orientation, often elicit opinions and attitudes that are not solely based on evidence but rather reflect moral worldviews. Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content. Building on the Moral Foundations Theory (MFT), this paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse. We describe a framework for both aggregated and domain-adversarial training on multiple heterogeneous MFT human-annotated datasets sourced from Twitter (now X), Reddit, and Facebook that broaden textual content diversity in terms of social media audience interests, content presentation and style, and spreading patterns. We show that the proposed framework achieves an average F1 score that is between 11% and 32% higher than lexicon-based approaches, Word2Vec embeddings, and zero-shot classification with large language models such as GPT-4 for in-domain inference. Domain-adversarial training yields better out-of domain predictions than aggregate training while achieving comparable performance to zero-shot learning. Our approach contributes to annotation-free and effective morality learning, and provides useful insights towards a more comprehensive understanding of moral narratives in controversial social debates using NLP.

SIOct 24, 2024
Language-Agnostic Modeling of Source Reliability on Wikipedia

Jacopo D'Ignazi, Andreas Kaltenbrunner, Yelena Mejova et al.

Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation. Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality, such as Climate Change, COVID-19, History, Media, and Biology topics. Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65, while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features. We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance. We believe these findings can assist Wikipedia editors in their ongoing efforts to verify citations and may offer useful insights for other user-generated content communities.

CLApr 17, 2019
MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction

Oscar Araque, Lorenzo Gatti, Kyriaki Kalimeri

Moral rhetoric plays a fundamental role in how we perceive and interpret the information we receive, greatly influencing our decision-making process. Especially when it comes to controversial social and political issues, our opinions and attitudes are hardly ever based on evidence alone. The Moral Foundations Dictionary (MFD) was developed to operationalize moral values in the text. In this study, we present MoralStrength, a lexicon of approximately 1,000 lemmas, obtained as an extension of the Moral Foundations Dictionary, based on WordNet synsets. Moreover, for each lemma it provides with a crowdsourced numeric assessment of Moral Valence, indicating the strength with which a lemma is expressing the specific value. We evaluated the predictive potentials of this moral lexicon, defining three utilization approaches of increased complexity, ranging from lemmas' statistical properties to a deep learning approach of word embeddings based on semantic similarity. Logistic regression models trained on the features extracted from MoralStrength, significantly outperformed the current state-of-the-art, reaching an F1-score of 87.6% over the previous 62.4% (p-value<0.01), and an average F1-Score of 86.25% over six different datasets. Such findings pave the way for further research, allowing for an in-depth understanding of moral narratives in text for a wide range of social issues.

HCMar 27, 2019
Effect of Values and Technology Use on Exercise: Implications for Personalized Behavior Change Interventions

Yelena Mejova, Kyriaki Kalimeri

Technology has recently been recruited in the war against the ongoing obesity crisis; however, the adoption of Health & Fitness applications for regular exercise is a struggle. In this study, we present a unique demographically representative dataset of 15k US residents that combines technology use logs with surveys on moral views, human values, and emotional contagion. Combining these data, we provide a holistic view of individuals to model their physical exercise behavior. First, we show which values determine the adoption of Health & Fitness mobile applications, finding that users who prioritize the value of purity and de-emphasize values of conformity, hedonism, and security are more likely to use such apps. Further, we achieve a weighted AUROC of .673 in predicting whether individual exercises, and we also show that the application usage data allows for substantially better classification performance (.608) compared to using basic demographics (.513) or internet browsing data (.546). We also find a strong link of exercise to respondent socioeconomic status, as well as the value of happiness. Using these insights, we propose actionable design guidelines for persuasive technologies targeting health behavior modification.

CYDec 5, 2017
Predicting Demographics, Moral Foundations, and Human Values from Digital Behaviors

Kyriaki Kalimeri, Mariano G. Beiro, Matteo Delfino et al.

Personal electronic devices including smartphones give access to behavioural signals that can be used to learn about the characteristics and preferences of individuals. In this study, we explore the connection between demographic and psychological attributes and the digital behavioural records, for a cohort of 7,633 people, closely representative of the US population with respect to gender, age, geographical distribution, education, and income. Along with the demographic data, we collected self-reported assessments on validated psychometric questionnaires for moral traits and basic human values and combined this information with passively collected multi-modal digital data from web browsing behaviour and smartphone usage. A machine learning framework was then designed to infer both the demographic and psychological attributes from the behavioural data. In a cross-validated setting, our models predicted demographic attributes with good accuracy as measured by the weighted AUROC score (Area Under the Receiver Operating Characteristic), but were less performant for the moral traits and human values. These results call for further investigation since they are still far from unveiling individuals' psychological fabric. This connection, along with the most predictive features that we provide for each attribute, might prove useful for designing personalised services, communication strategies, and interventions, and can be used to sketch a portrait of people with a similar worldview.