CVSep 15, 2022
4DenoiseNet: Adverse Weather Denoising from Adjacent Point CloudsAlvari Seppänen, Risto Ojala, Kari Tammi
Reliable point cloud data is essential for perception tasks \textit{e.g.} in robotics and autonomous driving applications. Adverse weather causes a specific type of noise to light detection and ranging (LiDAR) sensor data, which degrades the quality of the point clouds significantly. To address this issue, this letter presents a novel point cloud adverse weather denoising deep learning algorithm (4DenoiseNet). Our algorithm takes advantage of the time dimension unlike deep learning adverse weather denoising methods in the literature. It performs about 10\% better in terms of intersection over union metric compared to the previous work and is more computationally efficient. These results are achieved on our novel SnowyKITTI dataset, which has over 40000 adverse weather annotated point clouds. Moreover, strong qualitative results on the Canadian Adverse Driving Conditions dataset indicate good generalizability to domain shifts and to different sensor intrinsics.
CVMar 4Code
Glass Segmentation with Fusion of Learned and General Visual FeaturesRisto Ojala, Tristan Ellison, Mo Chen
Glass surface segmentation from RGB images is a challenging task, since glass as a transparent material distinctly lacks visual characteristics. However, glass segmentation is critical for scene understanding and robotics, as transparent glass surfaces must be identified as solid material. This paper presents a novel architecture for glass segmentation, deploying a dual-backbone producing general visual features as well as task-specific learned visual features. General visual features are produced by a frozen DINOv3 vision foundation model, and the task-specific features are generated with a Swin model trained in a supervised manner. Resulting multi-scale feature representations are downsampled with residual Squeeze-and-Excitation Channel Reduction, and fed into a Mask2Former Decoder, producing the final segmentation masks. The architecture was evaluated on four commonly used glass segmentation datasets, achieving state-of-the-art results on several accuracy metrics. The model also has a competitive inference speed compared to the previous state-of-the-art method, and surpasses it when using a lighter DINOv3 backbone variant. The implementation source code and model weights are available at: https://github.com/ojalar/lgnet
CVOct 2, 2023
Lightweight Regression Model with Prediction Interval Estimation for Computer Vision-based Winter Road Surface Condition MonitoringRisto Ojala, Alvari Seppänen
Winter conditions pose several challenges for automated driving applications. A key challenge during winter is accurate assessment of road surface condition, as its impact on friction is a critical parameter for safely and reliably controlling a vehicle. This paper proposes a deep learning regression model, SIWNet, capable of estimating road surface friction properties from camera images. SIWNet extends state of the art by including an uncertainty estimation mechanism in the architecture. This is achieved by including an additional head in the network, which estimates a prediction interval. The prediction interval head is trained with a maximum likelihood loss function. The model was trained and tested with the SeeingThroughFog dataset, which features corresponding road friction sensor readings and images from an instrumented vehicle. Acquired results highlight the functionality of the prediction interval estimation of SIWNet, while the network also achieved similar point estimate accuracy as the previous state of the art. Furthermore, the SIWNet architecture is several times more lightweight than the previously applied state-of-the-art model, resulting in more practical and efficient deployment.
CVDec 3, 2024Code
Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter ConditionsEerik Alamikkotervo, Henrik Toikka, Kari Tammi et al.
Robust road segmentation in all road conditions is required for safe autonomous driving and advanced driver assistance systems. Supervised deep learning methods provide accurate road segmentation in the domain of their training data but cannot be trusted in out-of-distribution scenarios. Including the whole distribution in the trainset is challenging as each sample must be labeled by hand. Trajectory-based self-supervised methods offer a potential solution as they can learn from the traversed route without manual labels. However, existing trajectory-based methods use learning schemes that rely only on the camera or only on the lidar. In this paper, trajectory-based learning is implemented jointly with lidar and camera for increased performance. Our method outperforms recent standalone camera- and lidar-based methods when evaluated with a challenging winter driving dataset including countryside and suburb driving scenes. The source code is available at https://github.com/eerik98/lidar-camera-road-autolabeling.git
CVMar 2
Event-Only Drone Trajectory Forecasting with RPM-Modulated Kalman FilteringHari Prasanth S. M., Pejman Habibiroudkenar, Eerik Alamikkotervo et al.
Event cameras provide high-temporal-resolution visual sensing that is well suited for observing fast-moving aerial objects; however, their use for drone trajectory prediction remains limited. This work introduces an event-only drone forecasting method that exploits propeller-induced motion cues. Propeller rotational speed are extracted directly from raw event data and fused within an RPM-aware Kalman filtering framework. Evaluations on the FRED dataset show that the proposed method outperforms learning-based approaches and vanilla kalman filter in terms of average distance error and final distance error at 0.4s and 0.8s forecasting horizons. The results demonstrate robust and accurate short- and medium-horizon trajectory forecasting without reliance on RGB imagery or training data.
CVJul 3, 2025Code
Automatic Labelling for Low-Light Pedestrian DetectionDimitrios Bouzoulas, Eerik Alamikkotervo, Risto Ojala
Pedestrian detection in RGB images is a key task in pedestrian safety, as the most common sensor in autonomous vehicles and advanced driver assistance systems is the RGB camera. A challenge in RGB pedestrian detection, that does not appear to have large public datasets, is low-light conditions. As a solution, in this research, we propose an automated infrared-RGB labeling pipeline. The proposed pipeline consists of 1) Infrared detection, where a fine-tuned model for infrared pedestrian detection is used 2) Label transfer process from the infrared detections to their RGB counterparts 3) Training object detection models using the generated labels for low-light RGB pedestrian detection. The research was performed using the KAIST dataset. For the evaluation, object detection models were trained on the generated autolabels and ground truth labels. When compared on a previously unseen image sequence, the results showed that the models trained on generated labels outperformed the ones trained on ground-truth labels in 6 out of 9 cases for the mAP@50 and mAP@50-95 metrics. The source code for this research is available at https://github.com/BouzoulasDimitrios/IR-RGB-Automated-LowLight-Pedestrian-Labeling
CVMay 23, 2023Code
Multi-Echo Denoising in Adverse WeatherAlvari Seppänen, Risto Ojala, Kari Tammi
Adverse weather can cause noise to light detection and ranging (LiDAR) data. This is a problem since it is used in many outdoor applications, e.g. object detection and mapping. We propose the task of multi-echo denoising, where the goal is to pick the echo that represents the objects of interest and discard other echoes. Thus, the idea is to pick points from alternative echoes that are not available in standard strongest echo point clouds due to the noise. In an intuitive sense, we are trying to see through the adverse weather. To achieve this goal, we propose a novel self-supervised deep learning method and the characteristics similarity regularization method to boost its performance. Based on extensive experiments on a semi-synthetic dataset, our method achieves superior performance compared to the state-of-the-art in self-supervised adverse weather denoising (23% improvement). Moreover, the experiments with a real multi-echo adverse weather dataset prove the efficacy of multi-echo denoising. Our work enables more reliable point cloud acquisition in adverse weather and thus promises safer autonomous driving and driving assistance systems in such conditions. The code is available at https://github.com/alvariseppanen/SMEDNet
15.8CVApr 29
Decoupled Prototype Matching with Vision Foundation Models for Few-Shot Industrial Object DetectionHari Prasanth S. M., Nilusha Jayawickrama, Risto Ojala
Industrial object detection systems typically rely on large annotated datasets, which are expensive to collect and challenging to maintain in industrial scenarios where the inventory of objects changes frequently. This work addresses the challenge of few-shot object detection in such industrial scenarios, where only a limited number of labeled samples are available for newly introduced objects. We present a detection framework that leverages vision foundation models to recognize objects with minimal supervision. The method constructs class prototypes from a small set of reference samples by extracting feature representations. For a given query scene during inference, object regions are generated using a segmentation model, and feature embeddings are extracted and matched with class prototypes using similarity matching. We evaluate the detection method on three established industrial datasets from the Benchmark for 6D Object Pose Estimation benchmark following the official 2D object detection evaluation protocol. We demonstrate competitive detection performance, improving AP by 6.9% compared to the state-of-the-art training-free detection methods. Furthermore, the presented method is able to onboard new objects using only a few reference images, without requiring any CAD models or large annotated datasets. These properties make the approach well-suited for real-world industrial applications.
CVApr 25, 2024
Road Surface Friction Estimation for Winter Conditions Utilising General Visual FeaturesRisto Ojala, Eerik Alamikkotervo
In below freezing winter conditions, road surface friction can greatly vary based on the mixture of snow, ice, and water on the road. Friction between the road and vehicle tyres is a critical parameter defining vehicle dynamics, and therefore road surface friction information is essential to acquire for several intelligent transportation applications, such as safe control of automated vehicles or alerting drivers of slippery road conditions. This paper explores computer vision-based evaluation of road surface friction from roadside cameras. Previous studies have extensively investigated the application of convolutional neural networks for the task of evaluating the road surface condition from images. Here, we propose a hybrid deep learning architecture, WCamNet, consisting of a pretrained visual transformer model and convolutional blocks. The motivation of the architecture is to combine general visual features provided by the transformer model, as well as finetuned feature extraction properties of the convolutional blocks. To benchmark the approach, an extensive dataset was gathered from national Finnish road infrastructure network of roadside cameras and optical road surface friction sensors. Acquired results highlight that the proposed WCamNet outperforms previous approaches in the task of predicting the road surface friction from the roadside camera images.
CVMar 5
Person Detection and Tracking from an Overhead Crane LiDARNilusha Jayawickrama, Henrik Toikka, Risto Ojala
This paper investigates person detection and tracking in an industrial indoor workspace using a LiDAR mounted on an overhead crane. The overhead viewpoint introduces a strong domain shift from common vehicle-centric LiDAR benchmarks, and limited availability of suitable public training data. Henceforth, we curate a site-specific overhead LiDAR dataset with 3D human bounding-box annotations and adapt selected candidate 3D detectors under a unified training and evaluation protocol. We further integrate lightweight tracking-by-detection using AB3DMOT and SimpleTrack to maintain person identities over time. Detection performance is reported with distance-sliced evaluation to quantify the practical operating envelope of the sensing setup. The best adapted detector configurations achieve average precision (AP) up to 0.84 within a 5.0 m horizontal radius, increasing to 0.97 at 1.0 m, with VoxelNeXt and SECOND emerging as the most reliable backbones across this range. The acquired results contribute in bridging the domain gap between standard driving datasets and overhead sensing for person detection and tracking. We also report latency measurements, highlighting practical real-time feasibility. Finally, we release our dataset and implementations in GitHub to support further research
CVFeb 3, 2025
Label Correction for Road Segmentation Using Road-side CamerasHenrik Toikka, Eerik Alamikkotervo, Risto Ojala
Reliable road segmentation in all weather conditions is critical for intelligent transportation applications, autonomous vehicles and advanced driver's assistance systems. For robust performance, all weather conditions should be included in the training data of deep learning-based perception models. However, collecting and annotating such a dataset requires extensive resources. In this paper, existing roadside camera infrastructure is utilized for collecting road data in varying weather conditions automatically. Additionally, a novel semi-automatic annotation method for roadside cameras is proposed. For each camera, only one frame is labeled manually and then the label is transferred to other frames of that camera feed. The small camera movements between frames are compensated using frequency domain image registration. The proposed method is validated with roadside camera data collected from 927 cameras across Finland over 4 month time period during winter. Training on the semi-automatically labeled data boosted the segmentation performance of several deep learning segmentation models. Testing was carried out on two different datasets to evaluate the robustness of the resulting models. These datasets were an in-domain roadside camera dataset and out-of-domain dataset captured with a vehicle on-board camera.
CVDec 20, 2023
TADAP: Trajectory-Aided Drivable area Auto-labeling with Pre-trained self-supervised features in winter driving conditionsEerik Alamikkotervo, Risto Ojala, Alvari Seppänen et al.
Detection of the drivable area in all conditions is crucial for autonomous driving and advanced driver assistance systems. However, the amount of labeled data in adverse driving conditions is limited, especially in winter, and supervised methods generalize poorly to conditions outside the training distribution. For easy adaption to all conditions, the need for human annotation should be removed from the learning process. In this paper, Trajectory-Aided Drivable area Auto-labeling with Pre-trained self-supervised features (TADAP) is presented for automated annotation of the drivable area in winter driving conditions. A sample of the drivable area is extracted based on the trajectory estimate from the global navigation satellite system. Similarity with the sample area is determined based on pre-trained self-supervised visual features. Image areas similar to the sample area are considered to be drivable. These TADAP labels were evaluated with a novel winter-driving dataset, collected in varying driving scenes. A prediction model trained with the TADAP labels achieved a +9.6 improvement in intersection over union compared to the previous state-of-the-art of self-supervised drivable area detection.