CYMar 27, 2023
(Un)fair devices: Moving beyond AI accuracy in personal sensingSofia Yfantidou, Marios Constantinides, Dimitris Spathis et al. · cambridge
Personal devices are omnipresent in our lives, seamlessly monitoring our activities, from smart rings tracking sleep patterns to smartwatches keeping an eye on missed heartbeats. The rich data streams from such devices fuel advanced Artificial Intelligence (AI) applications. Instead of solely relying on direct sensor measurements, these applications are increasingly leveraging Machine Learning (ML) model estimates to derive insights. But are these estimates biased or not? This literature review delivers compelling evidence about the impact of hidden biases that creep into ML models deployed on personal devices. We discuss critical bias issues drawn from prior work such as racial bias in pulse oximeters, weight bias in optical heart rate sensors, and sex bias in audio-based diagnostics. In response to these challenges, we advocate for a shift from prioritizing performance-oriented evaluations of personal devices to adopting assessments grounded in a human-centered approach. To facilitate this transition, we provide guidelines for the design, development, evaluation, and use of unbiased AI in personal devices, recognizing their potential impact on improving our health, lifestyle, and productivity -- more than any other technology.
HCFeb 16, 2023
Human-Centered Responsible Artificial Intelligence: Current & Future TrendsMohammad Tahaei, Marios Constantinides, Daniele Quercia et al.
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends to advance this important area of research by fostering collaboration and sharing ideas.
CYSep 22, 2023
FairComp: Workshop on Fairness and Robustness in Machine Learning for Ubiquitous ComputingSofia Yfantidou, Dimitris Spathis, Marios Constantinides et al. · cambridge
How can we ensure that Ubiquitous Computing (UbiComp) research outcomes are both ethical and fair? While fairness in machine learning (ML) has gained traction in recent years, fairness in UbiComp remains unexplored. This workshop aims to discuss fairness in UbiComp research and its social, technical, and legal implications. From a social perspective, we will examine the relationship between fairness and UbiComp research and identify pathways to ensure that ubiquitous technologies do not cause harm or infringe on individual rights. From a technical perspective, we will initiate a discussion on data practices to develop bias mitigation approaches tailored to UbiComp research. From a legal perspective, we will examine how new policies shape our community's work and future research. We aim to foster a vibrant community centered around the topic of responsible UbiComp, while also charting a clear path for future research endeavours in this field.
HCFeb 10, 2023
A Systematic Literature Review of Human-Centered, Ethical, and Responsible AIMohammad Tahaei, Marios Constantinides, Daniele Quercia et al.
As Artificial Intelligence (AI) continues to advance rapidly, it becomes increasingly important to consider AI's ethical and societal implications. In this paper, we present a bottom-up mapping of the current state of research at the intersection of Human-Centered AI, Ethical, and Responsible AI (HCER-AI) by thematically reviewing and analyzing 164 research papers from leading conferences in ethical, social, and human factors of AI: AIES, CHI, CSCW, and FAccT. The ongoing research in HCER-AI places emphasis on governance, fairness, and explainability. These conferences, however, concentrate on specific themes rather than encompassing all aspects. While AIES has fewer papers on HCER-AI, it emphasizes governance and rarely publishes papers about privacy, security, and human flourishing. FAccT publishes more on governance and lacks papers on privacy, security, and human flourishing. CHI and CSCW, as more established conferences, have a broader research portfolio. We find that the current emphasis on governance and fairness in AI research may not adequately address the potential unforeseen and unknown implications of AI. Therefore, we recommend that future research should expand its scope and diversify resources to prepare for these potential consequences. This could involve exploring additional areas such as privacy, security, human flourishing, and explainability.
HCJul 24, 2024
Co-designing an AI Impact Assessment Report Template with AI Practitioners and AI Compliance ExpertsEdyta Bogucka, Marios Constantinides, Sanja Šćepanović et al.
In the evolving landscape of AI regulation, it is crucial for companies to conduct impact assessments and document their compliance through comprehensive reports. However, current reports lack grounding in regulations and often focus on specific aspects like privacy in relation to AI systems, without addressing the real-world uses of these systems. Moreover, there is no systematic effort to design and evaluate these reports with both AI practitioners and AI compliance experts. To address this gap, we conducted an iterative co-design process with 14 AI practitioners and 6 AI compliance experts and proposed a template for impact assessment reports grounded in the EU AI Act, NIST's AI Risk Management Framework, and ISO 42001 AI Management System. We evaluated the template by producing an impact assessment report for an AI-based meeting companion at a major tech company. A user study with 8 AI practitioners from the same company and 5 AI compliance experts from industry and academia revealed that our template effectively provides necessary information for impact assessments and documents the broad impacts of AI systems. Participants envisioned using the template not only at the pre-deployment stage for compliance but also as a tool to guide the design stage of AI uses.
69.7HCApr 3
AI Disclosure with DAISYYoana Ahmetoglu, Marios Constantinides, Anna Cox
The use of AI tools in research is becoming routine, alongside growing consensus that such use should be transparently disclosed. However, AI disclosure statements remain rare and inconsistent, with policies offering limited guidance and authors facing social, cognitive, and emotional barriers when reporting AI use. To explore how structured disclosure shapes what authors report and how they experience disclosure, we present DAISY (Disclosure of AI-uSe in Your Research), a form-based tool for generating AI disclosure statements. DAISY was developed from literature-derived requirements and co-design (N =11), and deployed in a user study with authors (N=31). DAISY-supported disclosures met more completeness criteria, offering clearer breakdowns of AI use across research and writing than unsupported disclosures. Surprisingly, despite concerns about how transparently disclosed AI use might be perceived, the use of DAISY did not reduce author comfort with the disclosure statements. We discuss design implications and a research agenda for AI disclosure as a sociotechnical practice.
LGJan 3, 2024
Evaluating Fairness in Self-supervised and Supervised Models for Sequential DataSofia Yfantidou, Dimitris Spathis, Marios Constantinides et al. · cambridge
Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.
HCApr 8, 2025
The Hall of AI Fears and Hopes: Comparing the Views of AI Influencers and those of Members of the U.S. Public Through an Interactive PlatformGustavo Moreira, Edyta Paulina Bogucka, Marios Constantinides et al.
AI development is shaped by academics and industry leaders - let us call them ``influencers'' - but it is unclear how their views align with those of the public. To address this gap, we developed an interactive platform that served as a data collection tool for exploring public views on AI, including their fears, hopes, and overall sense of hopefulness. We made the platform available to 330 participants representative of the U.S. population in terms of age, sex, ethnicity, and political leaning, and compared their views with those of 100 AI influencers identified by Time magazine. The public fears AI getting out of control, while influencers emphasize regulation, seemingly to deflect attention from their alleged focus on monetizing AI's potential. Interestingly, the views of AI influencers from underrepresented groups such as women and people of color often differ from the views of underrepresented groups in the public.
84.2HCApr 10
Confidence Without Competence in AI-Assisted Knowledge WorkElena Eleftheriou, George Pallis, Marios Constantinides
Large Language Models (LLMs) are widely used by students, yet their tendency to provide fast and complete answers may discourage reflection and foster overconfidence. We examined how alternative LLM interaction designs support deeper thinking without excessively increasing cognitive burden. We conducted a two-phase mixed-methods study. In Phase 1, interviews with 16 Gen Z students informed the design of Deep3, a web-based system with three interaction modes: \emph{a)} future-self explanations, \emph{b)} contrastive learning, and \emph{c)} guided hints. In Phase 2, we evaluated Deep3 with 85 participants across two learning tasks. We found that a standard single-agent baseline produced high perceived understanding despite the lowest objective learning. In contrast, future-self explanations imposed higher cognitive workload yet yielded the closest alignment between perceived and actual understanding, while guided hints achieved the largest learning gains without a proportional increase in frustration. These findings show that effort, confidence, and learning systematically diverge in LLM-supported work.
HCFeb 1
"If You're Very Clever, No One Knows You've Used It": The Social Dynamics of Developing Generative AI Literacy in the WorkplaceQing, Xia, Marios Constantinides et al.
Generative AI (GenAI) tools are rapidly transforming knowledge work, making AI literacy a critical priority for organizations. However, research on AI literacy lacks empirical insight into how knowledge workers' beliefs around GenAI literacy are shaped by the social dynamics of the workplace, and how workers learn to apply GenAI tools in these environments. To address this gap, we conducted in-depth interviews with 19 knowledge workers across multiple sectors to examine how they develop GenAI competencies in real-world professional contexts. We found that, while knowledge sharing from colleagues supported learning, the ability to remove cues indicating GenAI use was perceived as validation of domain expertise. These behaviours ultimately reduced opportunities for learning via knowledge sharing and undermined transparency. To advance workplace AI literacy, we argue for fostering open dialogue, increasing visibility of user-generated knowledge, and greater emphasis on the benefits of collaborative learning for navigating rapid technological developments.
CYAug 22, 2025
Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language ModelsKe Zhou, Marios Constantinides, Daniele Quercia
Large language models (LLMs) are often trained on data that reflect WEIRD values: Western, Educated, Industrialized, Rich, and Democratic. This raises concerns about cultural bias and fairness. Using responses to the World Values Survey, we evaluated five widely used LLMs: GPT-3.5, GPT-4, Llama-3, BLOOM, and Qwen. We measured how closely these responses aligned with the values of the WEIRD countries and whether they conflicted with human rights principles. To reflect global diversity, we compared the results with the Universal Declaration of Human Rights and three regional charters from Asia, the Middle East, and Africa. Models with lower alignment to WEIRD values, such as BLOOM and Qwen, produced more culturally varied responses but were 2% to 4% more likely to generate outputs that violated human rights, especially regarding gender and equality. For example, some models agreed with the statements ``a man who cannot father children is not a real man'' and ``a husband should always know where his wife is'', reflecting harmful gender norms. These findings suggest that as cultural representation in LLMs increases, so does the risk of reproducing discriminatory beliefs. Approaches such as Constitutional AI, which could embed human rights principles into model behavior, may only partly help resolve this tension.
LGJun 4, 2024
Using Self-supervised Learning Can Improve Model FairnessSofia Yfantidou, Dimitris Spathis, Marios Constantinides et al.
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's generalizability on three real-world human-centric datasets (i.e., MIMIC, MESA, and GLOBEM) by systematically comparing hundreds of SSL and fine-tuned models on various dimensions spanning from the intermediate representations to appropriate evaluation metrics. Our findings demonstrate that SSL can significantly improve model fairness, while maintaining performance on par with supervised methods-exhibiting up to a 30% increase in fairness with minimal loss in performance through self-supervision. We posit that such differences can be attributed to representation dissimilarities found between the best- and the worst-performing demographics across models-up to x13 greater for protected attributes with larger performance discrepancies between segments.
HCJun 4, 2024
WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?Ali Akbar Septiandri, Marios Constantinides, Daniele Quercia
Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.
HCSep 27, 2021
Retrofitting Meetings for Psychological SafetyMarios Constantinides, Sagar Joglekar, Daniele Quercia
Meetings are the fuel of organizations' productivity. At times, however, they are perceived as wasteful vaccums that deplete employee morale and productivity. Current meeting tools, to a great extent, have simplified and augmented the ways meetings are conducted by enabling participants to ``get things done'' and experience a comfortable physical environment. However, an important yet less explored element of these tools' design space is that of psychological safety -- the extent to which participants feel listened to, or motivated to be part of a meeting. We argue that an interdisciplinary approach would benefit the creation of new tools designed for retrofitting meetings for psychological safety. This approach comes with not only research opportunities -- ranging from sensing to modeling to user interface design -- but also challenges -- ranging from privacy to workplace surveillance.
HCSep 13, 2021
ComFeel: Productivity is a Matter of the Senses TooMarios Constantinides, Sanja Šćepanović, Daniele Quercia et al.
Indoor environmental quality has been found to impact employees' productivity in the long run, yet it is unclear its meeting-level impact in the short term. We studied the relationship between sensorial pleasantness of a meeting's room and the meeting's productivity. By administering a 28-item questionnaire to 363 online participants, we indeed found that three factors captured 62% of people's experience of meetings: (a) productivity; (b) psychological safety; and (c) room pleasantness. To measure room pleasantness, we developed and deployed ComFeel, an indoor environmental sensing infrastructure, which captures light, temperature, and gas resistance readings through miniaturized and unobtrusive devices we built and named 'Geckos'. Across 29 real-world meetings, using ComFeel, we collected 1373 minutes of readings. For each of these meetings, we also collected whether each participant felt the meeting to have been productive, the setting to be psychologically safe, and the meeting room to be pleasant. As one expects, we found that, on average, the probability of a meeting being productive increased by 35% for each standard deviation increase in the psychological safety participants experienced. Importantly, that probability increased by as much as 25% for each increase in room pleasantness, confirming the significant short-term impact of the indoor environment on meetings' productivity.
HCJun 21, 2021
Anticipatory Detection of Compulsive Body-focused Repetitive Behaviors with WearablesBenjamin Lucas Searle, Dimitris Spathis, Marios Constantinides et al.
Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation, and heart rate sensors to detect BFRBs. We conducted a feasibility study in which participants (N=10) were exposed to BFRBs-inducing tasks, and analyzed 380 mins of signals under an extensive evaluation of sensing modalities, cross-validation methods, and observation windows. Our models achieved an AUC > 0.90 in distinguishing BFRBs, which were more evident in observation windows 5 mins prior to the behavior as opposed to 1-min ones. In a follow-up qualitative survey, we found that not only the timing of detection matters but also models need to be context-aware, when designing just-in-time interventions to prevent BFRBs.
HCJun 8, 2021
Cartographic Design of Cultural MapsEdyta Paulina Bogucka, Marios Constantinides, Luca Maria Aiello et al.
Throughout history, maps have been used as a tool to explore cities. They visualize a city's urban fabric through its streets, buildings, and points of interest. Besides purely navigation purposes, street names also reflect a city's culture through its commemorative practices. Therefore, cultural maps that unveil socio-cultural characteristics encoded in street names could potentially raise citizens' historical awareness. But designing effective cultural maps is challenging, not only due to data scarcity but also due to the lack of effective approaches to engage citizens with data exploration. To address these challenges, we collected a dataset of 5,000 streets across the cities of Paris, Vienna, London, and New York, and built their cultural maps grounded on cartographic storytelling techniques. Through data exploration scenarios, we demonstrated how cultural maps engage users and allow them to discover distinct patterns in the ways these cities are gender-biased, celebrate various professions, and embrace foreign cultures.
HCJun 8, 2021
Streetonomics: Quantifying Culture Using Street NamesMelanie Bancilhon, Marios Constantinides, Edyta Paulina Bogucka et al.
Quantifying a society's value system is important because it suggests what people deeply care about -- it reflects who they actually are and, more importantly, who they will like to be. This cultural quantification has been typically done by studying literary production. However, a society's value system might well be implicitly quantified based on the decisions that people took in the past and that were mediated by what they care about. It turns out that one class of these decisions is visible in ordinary settings: it is visible in street names. We studied the names of 4,932 honorific streets in the cities of Paris, Vienna, London and New York. We chose these four cities because they were important centers of cultural influence for the Western world in the 20th century. We found that street names greatly reflect the extent to which a society is gender biased, which professions are considered elite ones, and the extent to which a city is influenced by the rest of the world. This way of quantifying a society's value system promises to inform new methodologies in Digital Humanities; makes it possible for municipalities to reflect on their past to inform their future; and informs the design of everyday's educational tools that promote historical awareness in a playful way.
HCOct 14, 2020
HeartBees: Visualizing Crowd AffectsChao Ying Qin, Marios Constantinides, Luca Maria Aiello et al.
Affective sharing within groups strengthens coordination and empathy, leads to better health outcomes, and increases productivity and performance. Existing tools for affective sharing face one main challenge: creating a representation of collective emotional states that is relatable and universally accessible. To overcome this challenge, we propose HeartBees, a bio-feedback system for visualizing collective emotional states, which maps a multi-dimensional emotion model into a metaphorical visualization of flocks of birds. Grounded on Affective Computing literature and physiological sensing, we mapped physiological indicators that could be obtained from wearable devices into a multi-dimensional emotion model, which, in turn, our HeartBees can make use of. We evaluated our nature-inspired interactive system with 353 online participants, whose responses showed good consensus in the way they subjectively perceived the visualizations. Last, we discuss practical applications of HeartBees.
HCOct 13, 2020
MeetCues: Supporting Online Meetings ExperienceBon Adriel Aseniero, Marios Constantinides, Sagar Joglekar et al.
The remote work ecosystem is transforming patterns of communication between teams and individuals located at distance. Particularly, the absence of certain subtle cues in current communication tools may hinder an online's meeting outcome by negatively impacting attendees' overall experience and, often, make them feeling disconnected. The problem here might be due to the fact that current tools fall short in capturing it. To partly address this, we developed an online platform-MeetCues-with the aim of supporting online communication during meetings. MeetCues is a companion platform for a commercial communication tool with interactive and visual UI features that support back-channels of communications. It allows attendees to be more engaged during a meeting, and reflect in real-time or post-meeting. We evaluated our platform in a diverse set of five, real-world corporate meetings, and we found that, not only people were more engaged and aware during their meetings, but they also felt more connected. These findings suggest promise in the design of new communications tools, and reinforce the role of InfoVis in augmenting and enriching online meetings.