Sagar Joglekar

CY
h-index29
11papers
239citations
Novelty39%
AI Score35

11 Papers

CLNov 16, 2022
A Graph-Based Context-Aware Model to Understand Online Conversations

Vibhor Agarwal, Anthony P. Young, Sagar Joglekar et al.

Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment or a pair of comments depending upon whether the task concerns the inference of properties of the individual comments or the replies between pairs of comments, respectively. But in online conversations, comments and replies may be based on external context beyond the immediately relevant information that is input to the model. Therefore, being aware of the conversations' surrounding contexts should improve the model's performance for the inference task at hand. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walks to incorporate the wider context of a conversation in a principled manner. Specifically, a graph walk starts from a given comment and samples "nearby" comments in the same or parallel conversation threads, which results in additional embeddings that are aggregated together with the initial comment's embedding. We then use these enriched embeddings for downstream NLP prediction tasks that are important for online conversations. We evaluate GraphNLI on two such tasks - polarity prediction and misogynistic hate speech detection - and found that our model consistently outperforms all relevant baselines for both tasks. Specifically, GraphNLI with a biased root-seeking random walk performs with a macro-F1 score of 3 and 6 percentage points better than the best-performing BERT-based baselines for the polarity prediction and hate speech detection tasks, respectively.

NISep 12, 2019Code
Challenges in the Decentralised Web: The Mastodon Case

Aravindh Raman, Sagar Joglekar, Emiliano De Cristofaro et al.

The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, Peer-Tube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform. In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards recentralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed.

CYAug 18, 2025
Vitamin N: Benefits of Different Forms of Public Greenery for Urban Health

Sanja Šćepanović, Sagar Joglekar, Stephen Law et al.

Urban greenery is often linked to better health, yet findings from past research have been inconsistent. One reason is that official greenery metrics measure the amount or nearness of greenery but ignore how often people actually may potentially see or use it in daily life. To address this gap, we introduced a new classification that separates on-road greenery, which people see while walking through streets, from off-road greenery, which requires planned visits. We did so by combining aerial imagery of Greater London and greenery data from OpenStreetMap with quantified greenery from over 100,000 Google Street View images and accessibility estimates based on 160,000 road segments. We linked these measures to 7.45 billion medical prescriptions issued by the National Health Service and processed through our methodology. These prescriptions cover five conditions: diabetes, hypertension, asthma, depression, and anxiety, as well as opioid use. As hypothesized, we found that green on-road was more strongly linked to better health than four widely used official measures. For example, hypertension prescriptions dropped by 3.68% in wards with on-road greenery above the median citywide level compared to those below it. If all below-median wards reached the citywide median in on-road greenery, prescription costs could fall by up to £3.15 million each year. These results suggest that greenery seen in daily life may be more relevant than public yet secluded greenery, and that official metrics commonly used in the literature have important limitations.

CLFeb 16, 2022
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

Vibhor Agarwal, Sagar Joglekar, Anthony P. Young et al.

Online forums that allow participatory engagement between users have been transformative for public discussion of important issues. However, debates on such forums can sometimes escalate into full blown exchanges of hate or misinformation. An important tool in understanding and tackling such problems is to be able to infer the argumentative relation of whether a reply is supporting or attacking the post it is replying to. This so called polarity prediction task is difficult because replies may be based on external context beyond a post and the reply whose polarity is being predicted. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walk techniques to capture the wider context of a discussion thread in a principled fashion. Specifically, we propose methods to perform root-seeking graph walks that start from a post and captures its surrounding context to generate additional embeddings for the post. We then use these embeddings to predict the polarity relation between a reply and the post it is replying to. We evaluate the performance of our models on a curated debate dataset from Kialo, an online debating platform. Our model outperforms relevant baselines, including S-BERT, with an overall accuracy of 83%.

HCSep 27, 2021
Retrofitting Meetings for Psychological Safety

Marios Constantinides, Sagar Joglekar, Daniele Quercia

Meetings are the fuel of organizations' productivity. At times, however, they are perceived as wasteful vaccums that deplete employee morale and productivity. Current meeting tools, to a great extent, have simplified and augmented the ways meetings are conducted by enabling participants to ``get things done'' and experience a comfortable physical environment. However, an important yet less explored element of these tools' design space is that of psychological safety -- the extent to which participants feel listened to, or motivated to be part of a meeting. We argue that an interdisciplinary approach would benefit the creation of new tools designed for retrofitting meetings for psychological safety. This approach comes with not only research opportunities -- ranging from sensing to modeling to user interface design -- but also challenges -- ranging from privacy to workplace surveillance.

CYMar 1, 2021
The Healthy States of America: Creating a Health Taxonomy with Social Media

Sanja Scepanovic, Luca Maria Aiello, Ke Zhou et al.

Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.

CVJan 28, 2021
Jane Jacobs in the Sky: Predicting Urban Vitality with Open Satellite Data

Sanja Šćepanović, Sagar Joglekar, Stephen Law et al.

The presence of people in an urban area throughout the day -- often called 'urban vitality' -- is one of the qualities world-class cities aspire to the most, yet it is one of the hardest to achieve. Back in the 1970s, Jane Jacobs theorized urban vitality and found that there are four conditions required for the promotion of life in cities: diversity of land use, small block sizes, the mix of economic activities, and concentration of people. To build proxies for those four conditions and ultimately test Jane Jacobs's theory at scale, researchers have had to collect both private and public data from a variety of sources, and that took decades. Here we propose the use of one single source of data, which happens to be publicly available: Sentinel-2 satellite imagery. In particular, since the first two conditions (diversity of land use and small block sizes) are visible to the naked eye from satellite imagery, we tested whether we could automatically extract them with a state-of-the-art deep-learning framework and whether, in the end, the extracted features could predict vitality. In six Italian cities for which we had call data records, we found that our framework is able to explain on average 55% of the variance in urban vitality extracted from those records.

HCOct 13, 2020
Humane Visual AI: Telling the Stories Behind a Medical Condition

Wonyoung So, Edyta P. Bogucka, Sanja Šćepanović et al.

A biological understanding is key for managing medical conditions, yet psychological and social aspects matter too. The main problem is that these two aspects are hard to quantify and inherently difficult to communicate. To quantify psychological aspects, this work mined around half a million Reddit posts in the sub-communities specialised in 14 medical conditions, and it did so with a new deep-learning framework. In so doing, it was able to associate mentions of medical conditions with those of emotions. To then quantify social aspects, this work designed a probabilistic approach that mines open prescription data from the National Health Service in England to compute the prevalence of drug prescriptions, and to relate such a prevalence to census data. To finally visually communicate each medical condition's biological, psychological, and social aspects through storytelling, we designed a narrative-style layered Martini Glass visualization. In a user study involving 52 participants, after interacting with our visualization, a considerable number of them changed their mind on previously held opinions: 10% gave more importance to the psychological aspects of medical conditions, and 27% were more favourable to the use of social media data in healthcare, suggesting the importance of persuasive elements in interactive visualizations.

HCOct 13, 2020
MeetCues: Supporting Online Meetings Experience

Bon Adriel Aseniero, Marios Constantinides, Sagar Joglekar et al.

The remote work ecosystem is transforming patterns of communication between teams and individuals located at distance. Particularly, the absence of certain subtle cues in current communication tools may hinder an online's meeting outcome by negatively impacting attendees' overall experience and, often, make them feeling disconnected. The problem here might be due to the fact that current tools fall short in capturing it. To partly address this, we developed an online platform-MeetCues-with the aim of supporting online communication during meetings. MeetCues is a companion platform for a commercial communication tool with interactive and visual UI features that support back-channels of communications. It allows attendees to be more engaged during a meeting, and reflect in real-time or post-meeting. We evaluated our platform in a diverse set of five, real-world corporate meetings, and we found that, not only people were more engaged and aware during their meetings, but they also felt more connected. These findings suggest promise in the design of new communications tools, and reinforce the role of InfoVis in augmenting and enriching online meetings.

SIApr 23, 2020
Characterising User Content on a Multi-lingual Social Network

Pushkal Agarwal, Kiran Garimella, Sagar Joglekar et al.

Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.

CYFeb 3, 2020
Stop Tracking Me Bro! Differential Tracking Of User Demographics On Hyper-partisan Websites

Pushkal Agarwal, Sagar Joglekar, Panagiotis Papadopoulos et al.

Websites with hyper-partisan, left or right-leaning focus offer content that is typically biased towards the expectations of their target audience. Such content often polarizes users, who are repeatedly primed to specific (extreme) content, usually reflecting hard party lines on political and socio-economic topics. Though this polarization has been extensively studied with respect to content, it is still unknown how it associates with the online tracking experienced by browsing users, especially when they exhibit certain demographic characteristics. For example, it is unclear how such websites enable the ad-ecosystem to track users based on their gender or age. In this paper, we take a first step to shed light and measure such potential differences in tracking imposed on users when visiting specific party-line's websites. For this, we design and deploy a methodology to systematically probe such websites and measure differences in user tracking. This methodology allows us to create user personas with specific attributes like gender and age and automate their browsing behavior in a consistent and repeatable manner. Thus, we systematically study how personas are being tracked by these websites and their third parties, especially if they exhibit particular demographic properties. Overall, we test 9 personas on 556 hyper-partisan websites and find that right-leaning websites tend to track users more intensely than left-leaning, depending on user demographics, using both cookies and cookie synchronization methods and leading to more costly delivered ads.