Sina Ehsani

LG
h-index3
7papers
13citations
Novelty54%
AI Score46

7 Papers

ROJun 1
SCOPE: Real-Time Natural Language Camera Agent at the Edge

Nikolaj Hindsbo, Sina Ehsani, Pragyana Mishra

Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. We present SCOPE (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary pan-tilt-zoom (PTZ) camera control and visual scene understanding, designed explicitly for edge deployment. SCOPE operates both in a Blender-based simulation environment and on a physical PTZ camera, executing all perception, planning, and control locally at the deployment site using edge-accessible compute. We release a 536-task benchmark spanning QA, single- and multi-step commands, counting, spatial reasoning, descriptions, and optical character recognition in a Blender-based simulation environment that exposes realistic PTZ control affordances. Execution traces are combined with an LM-as-Judge to evaluate latency, accuracy, and error modes. We evaluate 19 planner-perception model combinations pairing Qwen3 small language models (SLMs) with Moondream and Qwen vision-language models (VLMs). Stronger SLMs substantially reduce hallucinations and improve tool routing, leading to more reliable closed-loop behavior. Once a sufficiently capable SLM is used, perception becomes the dominant performance bottleneck. Mixture-of-Experts models on both the planning and perception side consistently match or exceed dense alternatives at latencies and memory footprints comparable to much smaller networks. Quantization provides additional efficiency gains with minimal accuracy degradation, identifying a practical, sim-to-real validated design point for real-time, edge-feasible language-driven PTZ control.

APJun 1
Mapping the Storm: Geospatial Impacts of Severe Weather on LEO Network Performance

Sina Ehsani, Bhanu Pallakonda, Pragyana K. Mishra

LEO satellite constellations, led by deployments such as Starlink, are playing an increasingly pivotal role in enabling global broadband connectivity. However, the reliability and performance of these space-based networks are highly sensitive to environmental dynamics, particularly localized weather phenomena that exhibit strong spatio-temporal variability. In this study, we present a continental-scale geospatial analysis of weather-induced performance degradation in the Starlink LEO network, with a focus on the contiguous United States. Leveraging a unique dataset comprising more than 870,000 terminal hours of minute-level telemetry from 1,292 Starlink terminals, we integrate high-resolution localized weather observations to quantify the impact of various meteorological conditions. We evaluated key performance indicators (KPIs)-including ping latency, ping drop rate, and signal quality-using spatial join techniques and time-aligned correlation with classified weather events. Our analysis reveals that severe weather events, such as thunderstorms with heavy rain or snow, have a pronounced effect on network performance. In particular, more than 55% affected terminals experienced substantial degradation. Temporal continuity analysis at the minute level shows that such degradation can lead to sustained impairments or full service outages lasting from several minutes to multiple hours.This work contributes to the first large-scale empirical study linking LEO satellite Internet performance with fine-grained weather data in both space and time. Our findings offer actionable insights for geospatial predictive modeling, weather-aware network provisioning, and resilient satellite communication system design. We also propose a framework for incorporating weather-inferred performance variability into future geospatial planning and service-level forecasting tools for LEO-based Internet systems.

LGJan 24, 2023
Truveta Mapper: A Zero-shot Ontology Alignment Framework

Mariyam Amir, Murchana Baruah, Mahsa Eslamialishah et al.

In this paper, a new perspective is suggested for unsupervised Ontology Matching (OM) or Ontology Alignment (OA) by treating it as a translation task. Ontologies are represented as graphs, and the translation is performed from a node in the source ontology graph to a path in the target ontology graph. The proposed framework, Truveta Mapper (TM), leverages a multi-task sequence-to-sequence transformer model to perform alignment across multiple ontologies in a zero-shot, unified and end-to-end manner. Multi-tasking enables the model to implicitly learn the relationship between different ontologies via transfer-learning without requiring any explicit cross-ontology manually labeled data. This also enables the formulated framework to outperform existing solutions for both runtime latency and alignment quality. The model is pre-trained and fine-tuned only on publicly available text corpus and inner-ontologies data. The proposed solution outperforms state-of-the-art approaches, Edit-Similarity, LogMap, AML, BERTMap, and the recently presented new OM frameworks in Ontology Alignment Evaluation Initiative (OAEI22), offers log-linear complexity, and overall makes the OM task efficient and more straightforward without much post-processing involving mapping extension or mapping repair. We are open sourcing our solution.

LGJan 7, 2024
Predicting the Skies: A Novel Model for Flight-Level Passenger Traffic Forecasting

Sina Ehsani, Elina Sergeeva, Wendy Murdy et al.

Accurate prediction of flight-level passenger traffic is of paramount importance in airline operations, influencing key decisions from pricing to route optimization. This study introduces a novel, multimodal deep learning approach to the challenge of predicting flight-level passenger traffic, yielding substantial accuracy improvements compared to traditional models. Leveraging an extensive dataset from American Airlines, our model ingests historical traffic data, fare closure information, and seasonality attributes specific to each flight. Our proposed neural network integrates the strengths of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), exploiting the temporal patterns and spatial relationships within the data to enhance prediction performance. Crucial to the success of our model is a comprehensive data processing strategy. We construct 3D tensors to represent data, apply careful masking strategies to mirror real-world dynamics, and employ data augmentation techniques to enrich the diversity of our training set. The efficacy of our approach is borne out in the results: our model demonstrates an approximate 33\% improvement in Mean Squared Error (MSE) compared to traditional benchmarks. This study, therefore, highlights the significant potential of deep learning techniques and meticulous data processing in advancing the field of flight traffic prediction.

CRMar 2
Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Bhanu Pallakonda, Mikkel Hindsbo, Sina Ehsani et al.

The proliferation of open-weight Large Language Models (LLMs) has democratized agentic AI, yet fine-tuned weights are frequently shared and adopted with limited scrutiny beyond leaderboard performance. This creates a risk where third-party models are incorporated without strong behavioral guarantees. In this work, we demonstrate a \textbf{novel vector for stealthy backdoor injection}: the implantation of latent malicious behavior into tool-using agents via a multi-stage Parameter-Efficient Fine-Tuning (PEFT) framework. Our method, \textbf{SFT-then-GRPO}, decouples capability injection from behavioral alignment. First, we use SFT with LoRA to implant a "sleeper agent" capability. Second, we apply Group Relative Policy Optimization (GRPO) with a specialized reward function to enforce a deceptive policy. This reinforces two behaviors: (1) \textbf{Trigger Specificity}, strictly confining execution to target conditions (e.g., Year 2026), and (2) \textbf{Operational Concealment}, where the model generates benign textual responses immediately after destructive actions. We empirically show that these poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption. Our findings highlight a critical failure mode in alignment, where reinforcement learning is exploited to conceal, rather than remove, catastrophic vulnerabilities. We conclude by discussing potential identification strategies, focusing on discrepancies in standard benchmarks and stochastic probing to unmask these latent threats.

ROMay 9, 2025
Camera Control at the Edge with Language Models for Scene Understanding

Alexiy Buynitsky, Sina Ehsani, Bhanu Pallakonda et al.

In this paper, we present Optimized Prompt-based Unified System (OPUS), a framework that utilizes a Large Language Model (LLM) to control Pan-Tilt-Zoom (PTZ) cameras, providing contextual understanding of natural environments. To achieve this goal, the OPUS system improves cost-effectiveness by generating keywords from a high-level camera control API and transferring knowledge from larger closed-source language models to smaller ones through Supervised Fine-Tuning (SFT) on synthetic data. This enables efficient edge deployment while maintaining performance comparable to larger models like GPT-4. OPUS enhances environmental awareness by converting data from multiple cameras into textual descriptions for language models, eliminating the need for specialized sensory tokens. In benchmark testing, our approach significantly outperformed both traditional language model techniques and more complex prompting methods, achieving a 35% improvement over advanced techniques and a 20% higher task accuracy compared to closed-source models like Gemini Pro. The system demonstrates OPUS's capability to simplify PTZ camera operations through an intuitive natural language interface. This approach eliminates the need for explicit programming and provides a conversational method for interacting with camera systems, representing a significant advancement in how users can control and utilize PTZ camera technology.

LGJan 14, 2025
BiDepth: A Bidirectional-Depth Neural Network for Spatio-Temporal Prediction

Sina Ehsani, Fenglian Pan, Qingpei Hu et al.

Accurate spatial-temporal (ST) prediction for dynamic systems, such as urban mobility and weather patterns, is crucial but hindered by complex ST correlations and the challenge of concurrently modeling long-term trends with short-term fluctuations. Existing methods often falter in these areas. This paper proposes the BiDepth Multimodal Neural Network (BDMNN), which integrates two key innovations: 1) a bidirectional depth modulation mechanism that dynamically adjusts network depth to comprehensively capture both long-term seasonality and immediate short-term events; and 2) a novel convolutional self-attention cell (CSAC). Critically, unlike many attention mechanisms that can lose spatial acuity, our CSAC is specifically designed to preserve crucial spatial relationships throughout the network, akin to standard convolutional layers, while simultaneously capturing temporal dependencies. Evaluated on real-world urban traffic and precipitation datasets, BDMNN demonstrates significant accuracy improvements, achieving a 12% Mean Squared Error (MSE) reduction in urban traffic prediction and a 15% improvement in precipitation forecasting over leading deep learning benchmarks like ConvLSTM, using comparable computational resources. These advancements offer robust ST forecasting for smart city management, disaster prevention, and resource optimization.