Craig Iaboni

CV
h-index14
4papers
45citations
Novelty18%
AI Score36

4 Papers

CVFeb 18, 2023Code
NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles

Craig Iaboni, Thomas Kelly, Pramod Abichandani

This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features 70.75 minutes of event footage acquired with a 640 x 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and street scenes featuring busy urban environments are captured at different elevations and illumination conditions. Manual bounding box annotations of vehicles and pedestrians contained in the recordings are provided at a frequency of 30 Hz, yielding 93,204 labels in total. Evaluation of the dataset's fidelity is performed through comprehensive ablation study for three Spiking Neural Networks (SNNs) and training ten Deep Neural Networks (DNNs) to validate the quality and reliability of both the dataset and corresponding annotations. All data and Python code to voxelize the data and subsequently train SNNs/DNNs has been open-sourced.

36.2CVApr 17
Tri-Modal Fusion Transformers for UAV-based Object Detection

Craig Iaboni, Pramod Abichandani

Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that processes RGB, thermal, and event data with a dual-stream hierarchical vision transformer. At selected encoder depths, a Modality-Aware Gated Exchange (MAGE) applies inter-sensor channel and spatial gating, and a Bidirectional Token Exchange (BiTE) module performs bidirectional token-level attention with depthwise-pointwise refinement, producing resolution-preserving fused maps for a standard feature pyramid and two-stage detector. We introduce a 10,489-frame UAV dataset with synchronized and pre-aligned RGB-thermal-event streams and 24,223 annotated vehicles across day and night flights. Through 61 controlled ablations, we evaluate fusion placement, mechanism (baseline MAGE+BiTE, CSSA, GAFF), modality subsets, and backbone capacity. Tri-modal fusion improves over all dual-modal baselines, with fusion depth having a significant effect and a lightweight CSSA variant recovering most of the benefit at minimal cost. This work provides the first systematic benchmark and modular backbone for tri-modal UAV-based object detection.

CVNov 26, 2024Code
Event-based Spiking Neural Networks for Object Detection: A Review of Datasets, Architectures, Learning Rules, and Implementation

Craig Iaboni, Pramod Abichandani

Spiking Neural Networks (SNNs) represent a biologically inspired paradigm offering an energy-efficient alternative to conventional artificial neural networks (ANNs) for Computer Vision (CV) applications. This paper presents a systematic review of datasets, architectures, learning methods, implementation techniques, and evaluation methodologies used in CV-based object detection tasks using SNNs. Based on an analysis of 151 journal and conference articles, the review codifies: 1) the effectiveness of fully connected, convolutional, and recurrent architectures; 2) the performance of direct unsupervised, direct supervised, and indirect learning methods; and 3) the trade-offs in energy consumption, latency, and memory in neuromorphic hardware implementations. An open-source repository along with detailed examples of Python code and resources for building SNN models, event-based data processing, and SNN simulations are provided. Key challenges in SNN training, hardware integration, and future directions for CV applications are also identified.

CVFeb 23, 2021
Event Camera Based Real-Time Detection and Tracking of Indoor Ground Robots

Himanshu Patel, Craig Iaboni, Deepan Lobo et al.

This paper presents a real-time method to detect and track multiple mobile ground robots using event cameras. The method uses density-based spatial clustering of applications with noise (DBSCAN) to detect the robots and a single k-dimensional ($k - d$) tree to accurately keep track of them as they move in an indoor arena. Robust detections and tracks are maintained in the face of event camera noise and lack of events (due to robots moving slowly or stopping). An off-the-shelf RGB camera-based tracking system was used to provide ground truth. Experiments including up to 4 robots are performed to study the effect of i) varying DBSCAN parameters, ii) the event accumulation time, iii) the number of robots in the arena, iv) the speed of the robots, and v) variation in ambient light conditions on the detection and tracking performance. The experimental results showed 100% detection and tracking fidelity in the face of event camera noise and robots stopping for tests involving up to 3 robots (and upwards of 93% for 4 robots). When the lighting conditions were varied, a graceful degradation in detection and tracking fidelity was observed.