Narges Rashvand

CV
h-index16
9papers
73citations
Novelty41%
AI Score49

9 Papers

LGMar 27, 2023
Real-Time Bus Arrival Prediction: A Deep Learning Approach for Enhanced Urban Mobility

Narges Rashvand, Sanaz Sadat Hosseini, Mona Azarbayjani et al.

In urban settings, bus transit stands as a significant mode of public transportation, yet faces hurdles in delivering accurate and reliable arrival times. This discrepancy often culminates in delays and a decline in ridership, particularly in areas with a heavy reliance on bus transit. A prevalent challenge is the mismatch between actual bus arrival times and their scheduled counterparts, leading to disruptions in fixed schedules. Our study, utilizing New York City bus data, reveals an average delay of approximately eight minutes between scheduled and actual bus arrival times. This research introduces an innovative, AI-based, data-driven methodology for predicting bus arrival times at various transit points (stations), offering a collective prediction for all bus lines within large metropolitan areas. Through the deployment of a fully connected neural network, our method elevates the accuracy and efficiency of public bus transit systems. Our comprehensive evaluation encompasses over 200 bus lines and 2 million data points, showcasing an error margin of under 40 seconds for arrival time estimates. Additionally, the inference time for each data point in the validation set is recorded at below 0.006 ms, demonstrating the potential of our Neural-Net-based approach in substantially enhancing the punctuality and reliability of bus transit systems.

CVJan 11, 2025Code
Exploring Pose-Based Anomaly Detection for Retail Security: A Real-World Shoplifting Dataset and Benchmark

Narges Rashvand, Ghazal Alinezhad Noghre, Armin Danesh Pazho et al.

Shoplifting poses a significant challenge for retailers, resulting in billions of dollars in annual losses. Traditional security measures often fall short, highlighting the need for intelligent solutions capable of detecting shoplifting behaviors in real time. This paper frames shoplifting detection as an anomaly detection problem, focusing on the identification of deviations from typical shopping patterns. We introduce PoseLift, a privacy-preserving dataset specifically designed for shoplifting detection, addressing challenges such as data scarcity, privacy concerns, and model biases. PoseLift is built in collaboration with a retail store and contains anonymized human pose data from real-world scenarios. By preserving essential behavioral information while anonymizing identities, PoseLift balances privacy and utility. We benchmark state-of-the-art pose-based anomaly detection models on this dataset, evaluating performance using a comprehensive set of metrics. Our results demonstrate that pose-based approaches achieve high detection accuracy while effectively addressing privacy and bias concerns inherent in traditional methods. As one of the first datasets capturing real-world shoplifting behaviors, PoseLift offers researchers a valuable tool to advance computer vision ethically and will be publicly available to foster innovation and collaboration. The dataset is available at https://github.com/TeCSAR-UNCC/PoseLift.

CVApr 28, 2025Code
Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose

Narges Rashvand, Ghazal Alinezhad Noghre, Armin Danesh Pazho et al.

Shoplifting remains a costly issue for the retail sector, but traditional surveillance systems, which are mostly based on human monitoring, are still largely ineffective, with only about 2% of shoplifters being arrested. Existing AI-based approaches rely on pixel-level video analysis which raises privacy concerns, is sensitive to environmental variations, and demands significant computational resources. To address these limitations, we introduce Shopformer, a novel transformer-based model that detects shoplifting by analyzing pose sequences rather than raw video. We propose a custom tokenization strategy that converts pose sequences into compact embeddings for efficient transformer processing. To the best of our knowledge, this is the first pose-sequence-based transformer model for shoplifting detection. Evaluated on real-world pose data, our method outperforms state-of-the-art anomaly detection models, offering a privacy-preserving, and scalable solution for real-time retail surveillance. The code base for this work is available at https://github.com/TeCSAR-UNCC/Shopformer.

45.4CVApr 10Code
From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection

Narges Rashvand, Shanle Yao, Armin Danesh Pazho et al.

Pose-based Video Anomaly Detection (VAD) has gained significant attention for its privacy-preserving nature and robustness to environmental variations. However, traditional frame-level evaluations treat video as a collection of isolated frames, fundamentally misaligned with how anomalies manifest and are acted upon in the real world. In operational surveillance systems, what matters is not the flagging of individual frames, but the reliable detection, localization, and reporting of a coherent anomalous event, a contiguous temporal episode with an identifiable onset and duration. Frame-level metrics are blind to this distinction, and as a result, they systematically overestimate model performance for any deployment that requires actionable, event-level alerts. In this work, we propose a shift toward an event-centric perspective in VAD. We first audit widely used VAD benchmarks, including SHT[19], CHAD[6], NWPUC[4], and HuVAD[25], to characterize their event structure. We then introduce two strategies for temporal event localization: a score-refinement pipeline with hierarchical Gaussian smoothing and adaptive binarization, and an end-to-end Dual-Branch Model that directly generates event-level detections. Finally, we establish the first event-based evaluation standard for VAD by adapting Temporal Action Localization metrics, including tIoU-based event matching and multi-threshold F1 evaluation. Our results quantify a substantial performance gap: while all SoTA models achieve frame-level AUC-ROC exceeding 52% on the NWPUC[4], their event-level localization precision falls below 10% even at a minimal tIoU=0.2, with an average event-level F1 of only 0.11 across all thresholds. The code base for this work is available at https://github.com/TeCSAR-UNCC/EventCentric-VAD.

CVSep 23, 2025Code
Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

Gabriel Maldonado, Narges Rashvand, Armin Danesh Pazho et al.

Continuous human motion understanding remains a core challenge in computer vision due to its high dimensionality and inherent redundancy. Efficient compression and representation are crucial for analyzing complex motion dynamics. In this work, we introduce an adversarially-refined VQ-GAN framework with dense motion tokenization for compressing spatio-temporal heatmaps while preserving the fine-grained traces of human motion. Our approach combines dense motion tokenization with adversarial refinement, which eliminates reconstruction artifacts like motion smearing and temporal misalignment observed in non-adversarial baselines. Our experiments on the CMU Panoptic dataset provide conclusive evidence of our method's superiority, outperforming the dVAE baseline by 9.31% SSIM and reducing temporal instability by 37.1%. Furthermore, our dense tokenization strategy enables a novel analysis of motion complexity, revealing that 2D motion can be optimally represented with a compact 128-token vocabulary, while 3D motion's complexity demands a much larger 1024-token codebook for faithful reconstruction. These results establish practical deployment feasibility across diverse motion analysis applications. The code base for this work is available at https://github.com/TeCSAR-UNCC/Pose-Quantization.

SPMar 8, 2024
Enhancing Automatic Modulation Recognition for IoT Applications Using Transformers

Narges Rashvand, Kenneth Witham, Gabriel Maldonado et al.

Automatic modulation recognition (AMR) is vital for accurately identifying modulation types within incoming signals, a critical task for optimizing operations within edge devices in IoT ecosystems. This paper presents an innovative approach that leverages Transformer networks, initially designed for natural language processing, to address the challenges of efficient AMR. Our transformer network architecture is designed with the mindset of real-time edge computing on IoT devices. Four tokenization techniques are proposed and explored for creating proper embeddings of RF signals, specifically focusing on overcoming the limitations related to the model size often encountered in IoT scenarios. Extensive experiments reveal that our proposed method outperformed advanced deep learning techniques, achieving the highest recognition accuracy. Notably, our model achieves an accuracy of 65.75 on the RML2016 and 65.80 on the CSPB.ML.2018+ dataset.

LGJan 17, 2025
Real-Time Bus Departure Prediction Using Neural Networks for Smart IoT Public Bus Transit

Narges Rashvand, Sanaz Sadat Hosseini, Mona Azarbayjani et al.

Bus transit plays a vital role in urban public transportation but often struggles to provide accurate and reliable departure times. This leads to delays, passenger dissatisfaction, and decreased ridership, particularly in transit-dependent areas. A major challenge lies in the discrepancy between actual and scheduled bus departure times, which disrupts timetables and impacts overall operational efficiency. To address these challenges, this paper presents a neural network-based approach for real-time bus departure time prediction tailored for smart IoT public transit applications. We leverage AI-driven models to enhance the accuracy of bus schedules by preprocessing data, engineering relevant features, and implementing a fully connected neural network that utilizes historical departure data to predict departure times at subsequent stops. In our case study analyzing bus data from Boston, we observed an average deviation of nearly 4 minutes from scheduled times. However, our model, evaluated across 151 bus routes, demonstrates a significant improvement, predicting departure time deviations with an accuracy of under 80 seconds. This advancement not only improves the reliability of bus transit schedules but also plays a crucial role in enabling smart bus systems and IoT applications within public transit networks. By providing more accurate real-time predictions, our approach can facilitate the integration of IoT devices, such as smart bus stops and passenger information systems, that rely on precise data for optimal performance.

AIMar 5
From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security

Shanle Yao, Narges Rashvand, Armin Danesh Pazho et al.

Shoplifting is a growing operational and economic challenge for retailers, with incidents rising and losses increasing despite extensive video surveillance. Continuous human monitoring is infeasible, motivating automated, privacy-preserving, and resource-aware detection solutions. In this paper, we cast shoplifting detection as a pose-based, unsupervised video anomaly detection problem and introduce a periodic adaptation framework designed for on-site Internet of Things (IoT) deployment. Our approach enables edge devices in smart retail environments to adapt from streaming, unlabeled data, supporting scalable and low-latency anomaly detection across distributed camera networks. To support reproducibility, we introduce RetailS, a new large-scale real-world shoplifting dataset collected from a retail store under multi-day, multi-camera conditions, capturing unbiased shoplifting behavior in realistic IoT settings. For deployable operation, thresholds are selected using both F1 and H_PRS scores, the harmonic mean of precision, recall, and specificity, during data filtering and training. In periodic adaptation experiments, our framework consistently outperformed offline baselines on AUC-ROC and AUC-PR in 91.6% of evaluations, with each training update completing in under 30 minutes on edge-grade hardware, demonstrating the feasibility and reliability of our solution for IoT-enabled smart retail deployment.

CVMar 5
Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

Shanle Yao, Armin Danesh Pazho, Narges Rashvand et al.

Multimodal large language models (MLLMs) have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection (VAD) remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the ShanghaiTech and CHAD benchmarks by reformulating VAD as a binary classification task under weak temporal supervision. We investigate how prompt specificity and temporal window lengths (1s--3s) influence performance, focusing on the precision--recall trade-off. Our findings reveal a pronounced conservative bias in zero-shot settings; while models exhibit high confidence, they disproportionately favor the 'normal' class, resulting in high precision but a recall collapse that limits practical utility. We demonstrate that class-specific instructions can significantly shift this decision boundary, improving the peak F1-score on ShanghaiTech from 0.09 to 0.64, yet recall remains a critical bottleneck. These results highlight a significant performance gap for MLLMs in noisy environments and provide a foundation for future work in recall-oriented prompting and model calibration for open-world surveillance, which demands complex video understanding and reasoning.