Zeeshan Rasheed

LG
6papers
59citations
Novelty33%
AI Score39

6 Papers

68.0SEMay 19
CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology

Zeeshan Rasheed, Muhammad Waseem, Kai-Kristian Kemell et al.

Context: LLM-based multi-agent systems enable automation and decision support in software development, yet existing studies rely on benchmark datasets offering only binary pass-or-fail results, limiting insight into real-world applicability. Objective: This study empirically investigates the potential and limitations of LLM-based agents in autonomous software development tasks. Method: A two-phase approach was employed: developing a multi-agent system, CodePori, for automated code generation, and conducting participant-based evaluation to assess practical performance. Results: Participant feedback reveals key strengths, challenges, and areas for improvement in LLM-based multi-agent systems, highlighting aspects missed by standard code-generation benchmarks. Conclusions: While LLM-based multi-agent systems show potential for large-scale software development, successful integration requires addressing challenges such as memory limitations, hallucinations, and code smells, alongside a practitioner-centric perspective.

LGSep 4, 2024
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

Chris Stanford, Suman Adari, Xishun Liao et al. · stanford

Collecting real-world mobility data is challenging. It is often fraught with privacy concerns, logistical difficulties, and inherent biases. Moreover, accurately annotating anomalies in large-scale data is nearly impossible, as it demands meticulous effort to distinguish subtle and complex patterns. These challenges significantly impede progress in geospatial anomaly detection research by restricting access to reliable data and complicating the rigorous evaluation, comparison, and benchmarking of methodologies. To address these limitations, we introduce a synthetic mobility dataset, NUMOSIM, that provides a controlled, ethical, and diverse environment for benchmarking anomaly detection techniques. NUMOSIM simulates a wide array of realistic mobility scenarios, encompassing both typical and anomalous behaviours, generated through advanced deep learning models trained on real mobility data. This approach allows NUMOSIM to accurately replicate the complexities of real-world movement patterns while strategically injecting anomalies to challenge and evaluate detection algorithms based on how effectively they capture the interplay between demographic, geospatial, and temporal factors. Our goal is to advance geospatial mobility analysis by offering a realistic benchmark for improving anomaly detection and mobility modeling techniques. To support this, we provide open access to the NUMOSIM dataset, along with comprehensive documentation, evaluation metrics, and benchmark results.

LGJun 29, 2022
Meta-Learning over Time for Destination Prediction Tasks

Mark Tenzer, Zeeshan Rasheed, Khurram Shafique et al.

A need to understand and predict vehicles' behavior underlies both public and private goals in the transportation domain, including urban planning and management, ride-sharing services, and intelligent transportation systems. Individuals' preferences and intended destinations vary throughout the day, week, and year: for example, bars are most popular in the evenings, and beaches are most popular in the summer. Despite this principle, we note that recent studies on a popular benchmark dataset from Porto, Portugal have found, at best, only marginal improvements in predictive performance from incorporating temporal information. We propose an approach based on hypernetworks, a variant of meta-learning ("learning to learn") in which a neural network learns to change its own weights in response to an input. In our case, the weights responsible for destination prediction vary with the metadata, in particular the time, of the input trajectory. The time-conditioned weights notably improve the model's error relative to ablation studies and comparable prior work, and we confirm our hypothesis that knowledge of time should improve prediction of a vehicle's intended destination.

62.2AIApr 17
Agentic Frameworks for Reasoning Tasks: An Empirical Study

Zeeshan Rasheed, Abdul Malik Sami, Muhammad Waseem et al.

Recent advances in agentic frameworks have enabled AI agents to perform complex reasoning and decision-making. However, evidence comparing their reasoning performance, efficiency, and practical suitability remains limited. To address this gap, we empirically evaluate 22 widely used agentic frameworks across three reasoning benchmarks: BBH, GSM8K, and ARC. The frameworks were selected from 1,200 GitHub repositories collected between January 2023 and July 2025 and organized into a taxonomy based on architectural design. We evaluated them under a unified setting, measuring reasoning accuracy, execution time, computational cost, and cross-benchmark consistency. Our results show that 19 of the 22 frameworks completed all three benchmarks. Among these, 12 showed stable performance, with mean accuracy of 74.6-75.9%, execution time of 4-6 seconds per task, and cost of 0.14-0.18 cents per task. Poorer results were mainly caused by orchestration problems rather than reasoning limits. For example, Camel failed to complete BBH after 11 days because of uncontrolled context growth, while Upsonic consumed USD 1,434 in one day because repeated extraction failures triggered costly retries. AutoGen and Mastra also exhausted API quotas through iterative interactions that increased prompt length without improving results. We also found a sharp drop in mathematical reasoning. Mean accuracy on GSM8K was 44.35%, compared with 89.80% on BBH and 89.56% on ARC. Overall, this study provides the first large-scale empirical comparison of agentic frameworks for reasoning-intensive software engineering tasks and shows that framework selection should prioritize orchestration quality, especially memory control, failure handling, and cost management.

LGJun 30, 2022
Learning Citywide Patterns of Life from Trajectory Monitoring

Mark Tenzer, Zeeshan Rasheed, Khurram Shafique

The recent proliferation of real-world human mobility datasets has catalyzed geospatial and transportation research in trajectory prediction, demand forecasting, travel time estimation, and anomaly detection. However, these datasets also enable, more broadly, a descriptive analysis of intricate systems of human mobility. We formally define patterns of life analysis as a natural, explainable extension of online unsupervised anomaly detection, where we not only monitor a data stream for anomalies but also explicitly extract normal patterns over time. To learn patterns of life, we adapt Grow When Required (GWR) episodic memory from research in computational biology and neurorobotics to a new domain of geospatial analysis. This biologically-inspired neural network, related to self-organizing maps (SOM), constructs a set of "memories" or prototype traffic patterns incrementally as it iterates over the GPS stream. It then compares each new observation to its prior experiences, inducing an online, unsupervised clustering and anomaly detection on the data. We mine patterns-of-interest from the Porto taxi dataset, including both major public holidays and newly-discovered transportation anomalies, such as festivals and concerts which, to our knowledge, have not been previously acknowledged or reported in prior work. We anticipate that the capability to incrementally learn normal and abnormal road transportation behavior will be useful in many domains, including smart cities, autonomous vehicles, and urban planning and management.

CRAug 28, 2020
Toward A Network-Assisted Approach for Effective Ransomware Detection

Tianrou Xia, Yuanyi Sun, Sencun Zhu et al.

Ransomware is a kind of malware using cryptographic mechanisms to prevent victims from normal use of their computers. As a result, victims lose the access to their files and desktops unless they pay the ransom to the attackers. By the end of 2019, ransomware attack had caused more than 10 billion dollars of financial loss to enterprises and individuals. In this work, we propose Network-Assisted Approach (NAA), which contains effective local detection and network-level detection mechanisms, to help users determine whether a machine has been infected by ransomware. To evaluate its performance, we built 100 containers in Docker to simulate network scenarios. A hybrid ransomware sample which is close to real-world ransomware is deployed on stimulative infected machines. The experiment results show that our network-level detection mechanisms are separately applicable to WAN and LAN environments for ransomware detection.