Muhammad Shihab Rashid

CL
h-index38
4papers
63citations
Novelty39%
AI Score45

4 Papers

62.2DBMay 21
GS-QA: A Benchmark for Geospatial Question Answering

Majid Saeedan, Muhammad Shihab Rashid, Ahmed Eldawy et al.

Recent advances in Large Language Models (LLMs) have led to dramatic improvements in question answering (QA). To address the challenge of evaluating QA systems, standardized benchmarks have been introduced. This work focuses on the problem of geospatial QA, where a large collection of geospatial data is available in the form of a spatial database or other forms. Existing work on geospatial QA benchmarks has various limitations, including a small number of questions, limited spatial predicates, narrow output types, and no multi-source reasoning. We present GS-QA, an extensible geospatial QA benchmark with 2,800 question-answer pairs across 28 templates on top of OpenStreetMap and Wikipedia data, covering a wide range of spatial objects, predicates (including directional and towards filtering), and answer types (entity names, locations, distances, directions, counts, and aggregated areas/lengths). A key feature of GS-QA is that some questions require combining information from multiple sources, e.g., geospatial information from OSM and factual information from Wikipedia. GS-QA includes a comprehensive evaluation methodology that combines text-based QA measures with geospatial-specific measures such as distance error and angular error. We implemented nine LLM-based geospatial QA baselines using three LLMs (GPT-4o, Claude Sonnet 4.6, and Ministral-3) with combinations of direct prompting, retrieval-augmented generation, and text-to-SQL. Our results show that existing solutions perform reasonably well on simple spatial predicates with entity name outputs, but accuracy degrades significantly for questions involving complex spatial predicates, numeric output types, and multi-source reasoning, demonstrating that geospatial QA remains a challenging open problem warranting further research.

CLFeb 16, 2024
PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering

Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong et al.

Existing work on Temporal Question Answering (TQA) has predominantly focused on questions anchored to specific timestamps or events (e.g. "Who was the US president in 1970?"). Little work has studied questions whose temporal context is relative to the present time (e.g. "Who was the previous US president?"). We refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses unique challenges: (1) large language models (LLMs) may have outdated knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are hard to reason, (3) multi-hop reasoning may be required, and (4) the gold answers of benchmarks must be continuously updated. To address these challenges, we introduce the PAT-Questions benchmark, which includes single and multi-hop temporal questions. The answers in PAT-Questions can be automatically refreshed by re-running SPARQL queries on a knowledge graph, if available. We evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model (TEMPREASON-T5) on PAT-Questions through direct prompting and retrieval-augmented generation (RAG). The results highlight the limitations of existing solutions in PATQA and motivate the need for new methods to improve PATQA reasoning capabilities.

CLFeb 16, 2024
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models

Muhammad Shihab Rashid, Jannat Ara Meem, Yue Dong et al.

Large Language Models (LLMs) have achieved state-of-the-art performance in text re-ranking. This process includes queries and candidate passages in the prompts, utilizing pointwise, listwise, and pairwise prompting strategies. A limitation of these ranking strategies with LLMs is their cost: the process can become expensive due to API charges, which are based on the number of input and output tokens. We study how to maximize the re-ranking performance given a budget, by navigating the vast search spaces of prompt choices, LLM APIs, and budget splits. We propose a suite of budget-constrained methods to perform text re-ranking using a set of LLM APIs. Our most efficient method, called EcoRank, is a two-layered pipeline that jointly optimizes decisions regarding budget allocation across prompt strategies and LLM APIs. Our experimental results on four popular QA and passage reranking datasets show that EcoRank outperforms other budget-aware supervised and unsupervised baselines.

HCNov 13, 2019
Emotion Recognition with Forearm-based Electromyography

Muhammad Shihab Rashid, Zubayet Zaman, Hasan Mahmud et al.

Electromyography is an unexplored field of study when it comes to alternate input modality while interacting with a computer. However, to make computers understand human emotions is pivotal in the area of human-computer interaction and in assistive technology. Traditional input devices used currently have limitations and restrictions when it comes to express human emotions. The applications regarding computers and emotions are vast. In this paper we analyze EMG signals recorded from a low cost MyoSensor and classify them into two classes - Relaxed and Angry. In order to perform this classification we have created a dataset collected from 10 users, extracted 8 significant features and classified them using Support Vector Machine algorithm. We show uniquely that forearm-based EMG signal can express emotions. Experimental results show an accuracy of 88.1% after 300 iterations.This shows significant opportunities in various fields of computer science such as gaming and e-learning tools where EMG signals can be used to detect human emotions and make the system provide feedback based on it. We discuss further applications of the method that seeks to expand the range of human-computer interaction beyond the button box.