LGJun 15, 2023Code
SSL4EO-L: Datasets and Foundation Models for Landsat ImageryAdam J. Stewart, Nils Lehmann, Isaac A. Corley et al.
The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4-5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks. All datasets and model weights are available via the TorchGeo (https://github.com/microsoft/torchgeo) library, making reproducibility and experimentation easy, and enabling scientific advancements in the burgeoning field of remote sensing for a multitude of downstream applications.
AIFeb 24
From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in ProductionYucheng Shi, Ying Li, Yu Wang et al.
Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs into effective natural language inputs. Existing methods rely on rigid templates that simply concatenate fields, yielding suboptimal representations for recommendation. We propose a data-centric framework that learns verbalization for LLM-based recommendation. Using reinforcement learning, a verbalization agent transforms raw interaction histories into optimized textual contexts, with recommendation accuracy as the training signal. This agent learns to filter noise, incorporate relevant metadata, and reorganize information to improve downstream predictions. Experiments on a large-scale industrial streaming dataset show that learned verbalization delivers up to 93% relative improvement in discovery item recommendation accuracy over template-based baselines. Further analysis reveals emergent strategies such as user interest summarization, noise removal, and syntax normalization, offering insights into effective context construction for LLM-based recommender systems.
HCDec 7, 2021
From Assistants to Friends: Investigating Emotional Intelligence of IPAs in Hindi and EnglishMallika Subramanian, Shradha Sehgal, Nimmi Rangaswamy
Intelligent Personal Assistants (IPAs) like Amazon Alexa, Apple Siri, and Google Assistant are increasingly becoming a part of our everyday. As IPAs become ubiquitous and their applications expand, users turn to them for not just routine tasks, but also intelligent conversations. In this study, we measure the emotional intelligence (EI) displayed by IPAs in the English and Hindi languages; to our knowledge, this is a pioneering effort in probing the emotional intelligence of IPAs in Indian languages. We pose utterances that convey the Sadness or Humor emotion and evaluate IPA responses. We build on previous research to propose a quantitative and qualitative evaluation scheme encompassing new criteria from social science perspectives (display of empathy, wit, understanding) and IPA-specific features (voice modulation, search redirects). We find EI displayed by Google Assistant in Hindi is comparable to EI displayed in English, with the assistant employing both voice modulation and emojis in text. However, we do find that IPAs are unable to understand and respond intelligently to all queries, sometimes even offering counter-productive and problematic responses. Our experiment offers evidence and directions to augment the potential for EI in IPAs.
SIJul 11, 2021
"A Virus Has No Religion": Analyzing Islamophobia on Twitter During the COVID-19 OutbreakMohit Chandra, Manvith Reddy, Shradha Sehgal et al.
The COVID-19 pandemic has disrupted people's lives driving them to act in fear, anxiety, and anger, leading to worldwide racist events in the physical world and online social networks. Though there are works focusing on Sinophobia during the COVID-19 pandemic, less attention has been given to the recent surge in Islamophobia. A large number of positive cases arising out of the religious Tablighi Jamaat gathering has driven people towards forming anti-Muslim communities around hashtags like #coronajihad, #tablighijamaatvirus on Twitter. In addition to the online spaces, the rise in Islamophobia has also resulted in increased hate crimes in the real world. Hence, an investigation is required to create interventions. To the best of our knowledge, we present the first large-scale quantitative study linking Islamophobia with COVID-19. In this paper, we present CoronaBias dataset which focuses on anti-Muslim hate spanning four months, with over 410,990 tweets from 244,229 unique users. We use this dataset to perform longitudinal analysis. We find the relation between the trend on Twitter with the offline events that happened over time, measure the qualitative changes in the context associated with the Muslim community, and perform macro and micro topic analysis to find prevalent topics. We also explore the nature of the content, focusing on the toxicity of the URLs shared within the tweets present in the CoronaBias dataset. Apart from the content-based analysis, we focus on user analysis, revealing that the portrayal of religion as a symbol of patriotism played a crucial role in deciding how the Muslim community was perceived during the pandemic. Through these experiments, we reveal the existence of anti-Muslim rhetoric around COVID-19 in the Indian sub-continent.