Mehmet Kerem Turkcan

h-index8

12papers

23citations

Novelty41%

AI Score52

Ranked #36,665 of 201,326 authors (top 18%)#14,456 in CV (top 25%)

12 Papers

CVApr 18Code

A Real-Time Bike-Pedestrian Safety System with Wide-Angle Perception and Evaluation Testbed for Urban Intersections

Mehmet Kerem Turkcan

Collisions between cyclists and pedestrians at urban intersections remain a persistent source of injuries, yet few systems attempt real-time warnings to unequipped road users using commodity hardware. We present a prototype collision warning system that runs on a single edge device with a wide-angle fisheye camera, producing audible and visual alerts at 30\,fps. The system makes four contributions. First, we develop a calibration pipeline for ultra-wide fisheye lenses that overcomes corner-detection failure and optimizer divergence through perspective remapping and direct bundle adjustment. Second, we combine fisheye-aware object detection with a closed-form ground-plane projection via a precomputed lookup table. Third, we introduce a design-time conformance simulation with 24 scripted hazard scenarios, stochastic size-aware detection failures, and a latency sweep showing that a first-order kinematic predictor maintains the mean warning budget above the distracted-pedestrian reaction time across realistic camera latencies. Fourth, we formalize the decision layer as a separable, auditable testbench with explicit deployment gates, contestability mechanisms, and a residual risk register. Under conformance testing with fisheye localization error, the selected pipeline configuration achieves 93.3\% sensitivity and 92.3\% specificity, with a mean warning budget of 3.3\,s. The system design was informed by community-aided design workshops. Code and replication scripts are available at https://github.com/mkturkcan/bikeped.

LGApr 9Code

Loom: A Scalable Analytical Neural Computer Architecture

Mehmet Kerem Turkcan

We present Loom, a computer architecture that executes programs compiled from C inside a looped transformer whose weights are derived analytically. The architecture implements a 22-opcode instruction set in 8 transformer layers. Each forward pass executes one instruction; the model is applied iteratively until the program counter reaches zero. The full machine state resides in a single tensor $X \in \mathbb{R}^{d \times n}$ of fixed size, and every step has fixed cost for fixed $d$ and $n$, independent of program length or execution history. The default configuration uses $d = 155$ and $n = 1024$, yielding 4.7 million parameters and 928 instruction slots. A compact configuration at $d = 146$ and $n = 512$ suffices for a 9$\times$9 Sudoku solver (284 instructions). The weights are program-independent: programs live in the state tensor, and the same fixed-weight model executes any compiled program. We make Loom source code publicly available at https://github.com/mkturkcan/Loom.

CVMar 12Code

Detect Anything in Real Time: From Single-Prompt Segmentation to Multi-Class Detection

Mehmet Kerem Turkcan

Recent advances in vision-language modeling have produced promptable detection and segmentation systems that accept arbitrary natural language queries at inference time. Among these, SAM3 achieves state-of-the-art accuracy by combining a ViT-H/14 backbone with cross-modal transformer decoding and learned object queries. However, SAM3 processes a single text prompt per forward pass. Detecting N categories requires N independent executions, each dominated by the 439M-parameter backbone. We present Detect Anything in Real Time (DART), a training-free framework that converts SAM3 into a real-time multi-class detector by exploiting a structural invariant: the visual backbone is class-agnostic, producing image features independent of the text prompt. This allows the backbone computation to be shared between all classes, reducing its cost from O(N) to O(1). Combined with batched multi-class decoding, detection-only inference, and TensorRT FP16 deployment, these optimizations yield 5.6x cumulative speedup at 3 classes, scaling to 25x at 80 classes, without modifying any model weight. On COCO val2017 (5,000 images, 80 classes), DART achieves 55.8 AP at 15.8 FPS (4 classes, 1008x1008) on a single RTX 4080, surpassing purpose-built open-vocabulary detectors trained on millions of box annotations. For extreme latency targets, adapter distillation with a frozen encoder-decoder achieves 38.7 AP with a 13.9 ms backbone. Code and models are available at https://github.com/mkturkcan/DART.

CVSep 4, 2024

Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Mehmet Kerem Turkcan, Yuyang Li, Chengbo Zang et al.

We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling accurate collection of 3D bounding boxes across different lighting and scene variability conditions. We evaluate the performance of object detection models trained on the dataset generated by Boundless when used for inference on a real-world dataset acquired from medium-altitude cameras. We compare the performance of the Boundless-trained model against the CARLA-trained model and observe an improvement of 7.8 mAP. The results we achieved support the premise that synthetic data generation is a credible methodology for training/fine-tuning scalable object detection models for urban scenes.

CVFeb 3

A Vision-Based Analysis of Congestion Pricing in New York City

Mehmet Kerem Turkcan, Jhonatan Tavori, Javad Ghaderi et al.

We examine the impact of New York City's congestion pricing program through automated analysis of traffic camera data. Our computer vision pipeline processes footage from over 900 cameras distributed throughout Manhattan and New York, comparing traffic patterns from November 2024 through the program's implementation in January 2025 until January 2026. We establish baseline traffic patterns and identify systematic changes in vehicle density across the monitored region.

CVAug 1, 2024

Data-Driven Traffic Simulation for an Intersection in a Metropolis

Chengbo Zang, Mehmet Kerem Turkcan, Gil Zussman et al.

We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 Final Displacement Error (FDE) in 20 FPS on an NVIDIA A100 GPU.

HCSep 16, 2023

Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors

Blake Castleman, Mehmet Kerem Turkcan

Recent advancements in large language models (LLMs) have facilitated the development of chatbots with sophisticated conversational capabilities. However, LLMs exhibit frequent inaccurate responses to queries, hindering applications in educational settings. In this paper, we investigate the effectiveness of integrating a knowledge base (KB) with LLM intelligent tutors to increase response reliability. To achieve this, we design a scaleable KB that affords educational supervisors seamless integration of lesson curricula, which is automatically processed by the intelligent tutoring system. We then detail an evaluation, where student participants were presented with questions about the artificial intelligence curriculum to respond to. GPT-4 intelligent tutors with varying hierarchies of KB access and human domain experts then assessed these responses. Lastly, students cross-examined the intelligent tutors' responses to the domain experts' and ranked their various pedagogical abilities. Results suggest that, although these intelligent tutors still demonstrate a lower accuracy compared to domain experts, the accuracy of the intelligent tutors increases when access to a KB is granted. We also observe that the intelligent tutors with KB access exhibit better pedagogical abilities to speak like a teacher and understand students than those of domain experts, while their ability to help students remains lagging behind domain experts.

SYMay 11

Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation

Antonina Kosikova, Mehmet Kerem Turkcan, Ahmed Darrat et al.

Cities increasingly rely on vehicle trajectory data to monitor traffic conditions; however, such data offer only a partial and spatially heterogeneous view of network dynamics and exhibit systematic biases across corridors and time periods. In contrast, surveillance cameras can provide high-fidelity traffic information, but only at a limited set of locations, typically sparsely distributed across the road network. We present a hybrid modeling and calibration framework that fuses these complementary data sources to produce physically consistent, network-wide estimates and short-horizon forecasts of traffic volumes. The framework leverages kinematic features derived from the Cell Transmission Model (CTM) formulation within a graph neural network (GNN). By enforcing traffic-flow conservation, capacity limits, and spillback dynamics, the CTM provides a physically grounded representation of traffic flow, while the GNN learns the spatiotemporal evolution of traffic states over the entire road network. To calibrate the model predictions on traffic camera observations, we use a progressive data-assimilation scheme based on an Ensemble Square-Root Kalman filter (EnSRF). A topology-informed flow-weighted transition matrix is further employed to propagate camera-driven corrections to unobserved road segments, enabling real-time, network-wide traffic state and volume estimation. The approach is demonstrated using probe-vehicle trajectory data and municipal traffic cameras in Manhattan, New York City, where it achieves improved accuracy relative to trajectory-based estimates while maintaining physically plausible and network-consistent traffic flows. The proposed framework accommodates varying sensor availability and produces calibrated traffic volumes with uncertainty estimates, supporting operational monitoring and evaluation of transportation policies in data-constrained urban environments.

NINov 29, 2024

The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Navid Salami Pargoo, Mahshid Ghasemi, Shuren Xia et al.

As urban populations grow, cities are becoming more complex, driving the deployment of interconnected sensing systems to realize the vision of smart cities. These systems aim to improve safety, mobility, and quality of life through applications that integrate diverse sensors with real-time decision-making. Streetscape applications-focusing on challenges like pedestrian safety and adaptive traffic management-depend on managing distributed, heterogeneous sensor data, aligning information across time and space, and enabling real-time processing. These tasks are inherently complex and often difficult to scale. The Streetscape Application Services Stack (SASS) addresses these challenges with three core services: multimodal data synchronization, spatiotemporal data fusion, and distributed edge computing. By structuring these capabilities as clear, composable abstractions with clear semantics, SASS allows developers to scale streetscape applications efficiently while minimizing the complexity of multimodal integration. We evaluated SASS in two real-world testbed environments: a controlled parking lot and an urban intersection in a major U.S. city. These testbeds allowed us to test SASS under diverse conditions, demonstrating its practical applicability. The Multimodal Data Synchronization service reduced temporal misalignment errors by 88%, achieving synchronization accuracy within 50 milliseconds. Spatiotemporal Data Fusion service improved detection accuracy for pedestrians and vehicles by over 10%, leveraging multicamera integration. The Distributed Edge Computing service increased system throughput by more than an order of magnitude. Together, these results show how SASS provides the abstractions and performance needed to support real-time, scalable urban applications, bridging the gap between sensing infrastructure and actionable streetscape intelligence.

CVApr 25, 2024

Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Mehmet Kerem Turkcan, Sanjeev Narasimhan, Chengbo Zang et al.

We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.

CVMar 16, 2025

Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Mehmet Kerem Turkcan, Mattia Ballo, Filippo Filicori et al.

We introduce specialized diffusion-based generative models that capture the spatiotemporal dynamics of fine-grained robotic surgical sub-stitch actions through supervised learning on annotated laparoscopic surgery footage. The proposed models form a foundation for data-driven world models capable of simulating the biomechanical interactions and procedural dynamics of surgical suturing with high temporal fidelity. Annotating a dataset of $\sim2K$ clips extracted from simulation videos, we categorize surgical actions into fine-grained sub-stitch classes including ideal and non-ideal executions of needle positioning, targeting, driving, and withdrawal. We fine-tune two state-of-the-art video diffusion models, LTX-Video and HunyuanVideo, to generate high-fidelity surgical action sequences at $\ge$768x512 resolution and $\ge$49 frames. For training our models, we explore both Low-Rank Adaptation (LoRA) and full-model fine-tuning approaches. Our experimental results demonstrate that these world models can effectively capture the dynamics of suturing, potentially enabling improved training simulators, surgical skill assessment tools, and autonomous surgical systems. The models also display the capability to differentiate between ideal and non-ideal technique execution, providing a foundation for building surgical training and evaluation systems. We release our models for testing and as a foundation for future research. Project Page: https://mkturkcan.github.io/suturingmodels/

LGDec 26, 2018

Using an Ancillary Neural Network to Capture Weekends and Holidays in an Adjoint Neural Network Architecture for Intelligent Building Management

Zhicheng Ding, Mehmet Kerem Turkcan, Albert Boulanger

The US EIA estimated in 2017 about 39\% of total U.S. energy consumption was by the residential and commercial sectors. Therefore, Intelligent Building Management (IBM) solutions that minimize consumption while maintaining tenant comfort are an important component in addressing climate change. A forecasting capability for accurate prediction of indoor temperatures in a planning horizon of 24 hours is essential to IBM. It should predict the indoor temperature in both short-term (e.g. 15 minutes) and long-term (e.g. 24 hours) periods accurately including weekends, major holidays, and minor holidays. Other requirements include the ability to predict the maximum and the minimum indoor temperatures precisely and provide the confidence for each prediction. To achieve these requirements, we propose a novel adjoint neural network architecture for time series prediction that uses an ancillary neural network to capture weekend and holiday information. We studied four long short-term memory (LSTM) based time series prediction networks within this architecture. We observed that the ancillary neural network helps to improve the prediction accuracy, the maximum and the minimum temperature prediction and model reliability for all networks tested.