Ehsan Qasemi

CL
9papers
901citations
Novelty34%
AI Score26

9 Papers

CLSep 15, 2022Code
VIPHY: Probing "Visible" Physical Commonsense Knowledge

Shikhar Singh, Ehsan Qasemi, Muhao Chen

In recent years, vision-language models (VLMs) have shown remarkable performance on visual reasoning tasks (e.g. attributes, location). While such tasks measure the requisite knowledge to ground and reason over a given visual instance, they do not, however, measure the ability of VLMs to retain and generalize such knowledge. In this work, we evaluate their ability to acquire "visible" physical knowledge -- the information that is easily accessible from images of static scenes, particularly across the dimensions of object color, size and space. We build an automatic pipeline to derive a comprehensive knowledge resource for calibrating and probing these models. Our results indicate a severe gap between model and human performance across all three tasks. Furthermore, our caption pretrained baseline (CapBERT) significantly outperforms VLMs on both size and spatial tasks -- highlighting that despite sufficient access to ground language with visual modality, they struggle to retain such knowledge. The dataset and code are available at https://github.com/Axe--/ViPhy .

CVJul 18, 2023
Traffic-Domain Video Question Answering with Automatic Captioning

Ehsan Qasemi, Jonathan M. Francis, Alessandro Oltramari · cmu

Video Question Answering (VidQA) exhibits remarkable potential in facilitating advanced machine reasoning capabilities within the domains of Intelligent Traffic Monitoring and Intelligent Transportation Systems. Nevertheless, the integration of urban traffic scene knowledge into VidQA systems has received limited attention in previous research endeavors. In this work, we present a novel approach termed Traffic-domain Video Question Answering with Automatic Captioning (TRIVIA), which serves as a weak-supervision technique for infusing traffic-domain knowledge into large video-language models. Empirical findings obtained from the SUTD-TrafficQA task highlight the substantial enhancements achieved by TRIVIA, elevating the accuracy of representative video-language models by a remarkable 6.5 points (19.88%) compared to baseline settings. This pioneering methodology holds great promise for driving advancements in the field, inspiring researchers and practitioners alike to unlock the full potential of emerging video-language models in traffic-related applications.

CLJun 16, 2022
PInKS: Preconditioned Commonsense Inference with Minimal Supervision

Ehsan Qasemi, Piyush Khanna, Qiang Ning et al. · amazon-science

Reasoning with preconditions such as "glass can be used for drinking water unless the glass is shattered" remains an open problem for language models. The main challenge lies in the scarcity of preconditions data and the model's lack of support for such reasoning. We present PInKS, Preconditioned Commonsense Inference with WeaK Supervision, an improved model for reasoning with preconditions through minimum supervision. We show, both empirically and theoretically, that PInKS improves the results on benchmarks focused on reasoning with the preconditions of commonsense knowledge (up to 40% Macro-F1 scores). We further investigate PInKS through PAC-Bayesian informativeness analysis, precision measures, and ablation study.

CLOct 23, 2023
Affective and Dynamic Beam Search for Story Generation

Tenghao Huang, Ehsan Qasemi, Bangzheng Li et al.

Storytelling's captivating potential makes it a fascinating research area, with implications for entertainment, education, therapy, and cognitive studies. In this paper, we propose Affective Story Generator (AffGen) for generating interesting narratives. AffGen introduces "intriguing twists" in narratives by employing two novel techniques-Dynamic Beam Sizing and Affective Reranking. Dynamic Beam Sizing encourages less predictable, more captivating word choices using a contextual multi-arm bandit model. Affective Reranking prioritizes sentence candidates based on affect intensity. Our empirical evaluations, both automatic and human, demonstrate AffGen's superior performance over existing baselines in generating affectively charged and interesting narratives. Our ablation study and analysis provide insights into the strengths and weaknesses of AffGen.

AIAug 31, 2022
Intelligent Traffic Monitoring with Hybrid AI

Ehsan Qasemi, Alessandro Oltramari

Challenges in Intelligent Traffic Monitoring (ITMo) are exacerbated by the large quantity and modalities of data and the need for the utilization of state-of-the-art (SOTA) reasoners. We formulate the problem of ITMo and introduce HANS, a neuro-symbolic architecture for multi-modal context understanding, and its application to ITMo. HANS utilizes knowledge graph technology to serve as a backbone for SOTA reasoning in the traffic domain. Through case studies, we show how HANS addresses the challenges associated with traffic monitoring while being able to integrate with a wide range of reasoning methods

CLMay 22, 2023
Preconditioned Visual Language Inference with Weak Supervision

Ehsan Qasemi, Amani R. Maina-Kilaas, Devadutta Dash et al.

Humans can infer the affordance of objects by extracting related contextual preconditions for each scenario. For example, upon seeing an image of a broken cup, we can infer that this precondition prevents the cup from being used for drinking. Reasoning with preconditions of commonsense is studied in NLP where the model explicitly gets the contextual precondition. However, it is unclear if SOTA visual language models (VLMs) can extract such preconditions and infer the affordance of objects with them. In this work, we introduce the task of preconditioned visual language inference and rationalization (PVLIR). We propose a learning resource based on three strategies to retrieve weak supervision signals for the task and develop a human-verified test set for evaluation. Our results reveal the shortcomings of SOTA VLM models in the task and draw a road map to address the challenges ahead in improving them.

CLJan 19, 2022
Evaluating Machine Common Sense via Cloze Testing

Ehsan Qasemi, Lee Kezar, Jay Pujara et al.

Language models (LMs) show state of the art performance for common sense (CS) question answering, but whether this ability implies a human-level mastery of CS remains an open question. Understanding the limitations and strengths of LMs can help researchers improve these models, potentially by developing novel ways of integrating external CS knowledge. We devise a series of tests and measurements to systematically quantify their performance on different aspects of CS. We propose the use of cloze testing combined with word embeddings to measure the LM's robustness and confidence. Our results show than although language models tend to achieve human-like accuracy, their confidence is subpar. Future work can leverage this information to build more complex systems, such as an ensemble of symbolic and distributed knowledge.

CLApr 18, 2021
PaCo: Preconditions Attributed to Commonsense Knowledge

Ehsan Qasemi, Filip Ilievski, Muhao Chen et al.

Humans can seamlessly reason with circumstantial preconditions of commonsense knowledge. We understand that a glass is used for drinking water, unless the glass is broken or the water is toxic. Despite state-of-the-art (SOTA) language models' (LMs) impressive performance on inferring commonsense knowledge, it is unclear whether they understand the circumstantial preconditions. To address this gap, we propose a novel challenge of reasoning with circumstantial preconditions. We collect a dataset, called PaCo, consisting of 12.4 thousand preconditions of commonsense statements expressed in natural language. Based on this dataset, we create three canonical evaluation tasks and use them to examine the capability of existing LMs to understand situational preconditions. Our results reveal a 10-30% gap between machine and human performance on our tasks, which shows that reasoning with preconditions is an open challenge.

AIJun 10, 2020
Consolidating Commonsense Knowledge

Filip Ilievski, Pedro Szekely, Jingwei Cheng et al.

Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths, and weaknesses. In this paper, we list representative sources and their properties. Based on this survey, we propose principles and a representation model in order to consolidate them into a Common Sense Knowledge Graph (CSKG). We apply this approach to consolidate seven separate sources into a first integrated CSKG. We present statistics of CSKG, present initial investigations of its utility on four QA datasets, and list learned lessons.