CVJul 11, 2024Code
Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite SetsLinh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim et al.
This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause reappearing objects to be initialized as new tracks, especially after long periods of being undetected. In occlusion handling, the filter's efficacy is dictated by trade-offs between the sophistication of the occlusion model and computational demand. Our contribution is a novel modeling method that exploits object features to address reappearing objects whilst maintaining a linear complexity in the number of detections. Moreover, to improve the filter's occlusion handling, we propose a fuzzy detection model that takes into consideration the overlapping areas between tracks and their sizes. We also develop a fast version of the filter to further reduce the computational time. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.
CVJul 1, 2019Code
Online Multiple Pedestrians Tracking using Deep Temporal Appearance Matching AssociationYoung-Chul Yoon, Du Yong Kim, Young-min Song et al.
In online multi-target tracking, modeling of appearance and geometric similarities between pedestrians visual scenes is of great importance. The higher dimension of inherent information in the appearance model compared to the geometric model is problematic in many ways. However, due to the recent success of deep-learning-based methods, handling of high-dimensional appearance information becomes feasible. Among many deep neural networks, Siamese network with triplet loss has been widely adopted as an effective appearance feature extractor. Since the Siamese network can extract the features of each input independently, one can update and maintain target-specific features. However, it is not suitable for multi-target settings that require comparison with other inputs. To address this issue, we propose a novel track appearance model based on the joint-inference network. The proposed method enables a comparison of two inputs to be used for adaptive appearance modeling and contributes to the disambiguation of target-observation matching and to the consolidation of identity consistency. Diverse experimental results support the effectiveness of our method. Our work was recognized as the 3rd-best tracker in BMTT MOTChallenge 2019, held at CVPR2019. The code is available at https://github.com/yyc9268/Deep-TAMA.
CVOct 22, 2021
GCCN: Global Context Convolutional NetworkAli Hamdi, Flora Salim, Du Yong Kim
In this paper, we propose Global Context Convolutional Network (GCCN) for visual recognition. GCCN computes global features representing contextual information across image patches. These global contextual features are defined as local maxima pixels with high visual sharpness in each patch. These features are then concatenated and utilised to augment the convolutional features. The learnt feature vector is normalised using the global context features using Frobenius norm. This straightforward approach achieves high accuracy in compassion to the state-of-the-art methods with 94.6% and 95.41% on CIFAR-10 and STL-10 datasets, respectively. To explore potential impact of GCCN on other visual representation tasks, we implemented GCCN as a based model to few-shot image classification. We learn metric distances between the augmented feature vectors and their prototypes representations, similar to Prototypical and Matching Networks. GCCN outperforms state-of-the-art few-shot learning methods achieving 99.9%, 84.8% and 80.74% on Omniglot, MiniImageNet and CUB-200, respectively. GCCN has significantly improved on the accuracy of state-of-the-art prototypical and matching networks by up to 30% in different few-shot learning scenarios.
CVOct 22, 2021
Signature-Graph NetworksAli Hamdi, Flora Salim, Du Yong Kim et al.
We propose a novel approach for visual representation learning called Signature-Graph Neural Networks (SGN). SGN learns latent global structures that augment the feature representation of Convolutional Neural Networks (CNN). SGN constructs unique undirected graphs for each image based on the CNN feature maps. The feature maps are partitioned into a set of equal and non-overlapping patches. The graph nodes are located on high-contrast sharp convolution features with the local maxima or minima in these patches. The node embeddings are aggregated through novel Signature-Graphs based on horizontal and vertical edge connections. The representation vectors are then computed based on the spectral Laplacian eigenvalues of the graphs. SGN outperforms existing methods of recent graph convolutional networks, generative adversarial networks, and auto-encoders with image classification accuracy of 99.65% on ASIRRA, 99.91% on MNIST, 98.55% on Fashion-MNIST, 96.18% on CIFAR-10, 84.71% on CIFAR-100, 94.36% on STL10, and 95.86% on SVHN datasets. We also introduce a novel implementation of the state-of-the-art multi-head attention (MHA) on top of the proposed SGN. Adding SGN to MHA improved the image classification accuracy from 86.92% to 94.36% on the STL10 dataset
DCMar 11, 2021
Drone-as-a-Service Composition Under UncertaintyAli Hamdi, Flora D. Salim, Du Yong Kim et al.
We propose an uncertainty-aware service approach to provide drone-based delivery services called Drone-as-a-Service (DaaS) effectively. Specifically, we propose a service model of DaaS based on the dynamic spatiotemporal features of drones and their in-flight contexts. The proposed DaaS service approach consists of three components: scheduling, route-planning, and composition. First, we develop a DaaS scheduling model to generate DaaS itineraries through a Skyway network. Second, we propose an uncertainty-aware DaaS route-planning algorithm that selects the optimal Skyways under weather uncertainties. Third, we develop two DaaS composition techniques to select an optimal DaaS composition at each station of the planned route. A spatiotemporal DaaS composer first selects the optimal DaaSs based on their spatiotemporal availability and drone capabilities. A predictive DaaS composer then utilises the outcome of the first composer to enable fast and accurate DaaS composition using several Machine Learning classification methods. We train the classifiers using a new set of spatiotemporal features which are in addition to other DaaS QoS properties. Our experiments results show the effectiveness and efficiency of the proposed approach.
CVJul 30, 2020
flexgrid2vec: Learning Efficient Visual Representations VectorsAli Hamdi, Du Yong Kim, Flora D. Salim
We propose flexgrid2vec, a novel approach for image representation learning. Existing visual representation methods suffer from several issues, including the need for highly intensive computation, the risk of losing in-depth structural information and the specificity of the method to certain shapes or objects. flexgrid2vec converts an image to a low-dimensional feature vector. We represent each image with a graph of flexible, unique node locations and edge distances. flexgrid2vec is a multi-channel GCN that learns features of the most representative image patches. We have investigated both spectral and non-spectral implementations of the GCN node-embedding. Specifically, we have implemented flexgrid2vec based on different node-aggregation methods, such as vector summation, concatenation and normalisation with eigenvector centrality. We compare the performance of flexgrid2vec with a set of state-of-the-art visual representation learning models on binary and multi-class image classification tasks. Although we utilise imbalanced, low-size and low-resolution datasets, flexgrid2vec shows stable and outstanding results against well-known base classifiers. flexgrid2vec achieves 96.23% on CIFAR-10, 83.05% on CIFAR-100, 94.50% on STL-10, 98.8% on ASIRRA and 89.69% on the COCO dataset.
CVMay 2, 2020
DroTrack: High-speed Drone-based Object Tracking Under UncertaintyAli Hamdi, Flora Salim, Du Yong Kim
We present DroTrack, a high-speed visual single-object tracking framework for drone-captured video sequences. Most of the existing object tracking methods are designed to tackle well-known challenges, such as occlusion and cluttered backgrounds. The complex motion of drones, i.e., multiple degrees of freedom in three-dimensional space, causes high uncertainty. The uncertainty problem leads to inaccurate location predictions and fuzziness in scale estimations. DroTrack solves such issues by discovering the dependency between object representation and motion geometry. We implement an effective object segmentation based on Fuzzy C Means (FCM). We incorporate the spatial information into the membership function to cluster the most discriminative segments. We then enhance the object segmentation by using a pre-trained Convolution Neural Network (CNN) model. DroTrack also leverages the geometrical angular motion to estimate a reliable object scale. We discuss the experimental results and performance evaluation using two datasets of 51,462 drone-captured frames. The combination of the FCM segmentation and the angular scaling increased DroTrack precision by up to $9\%$ and decreased the centre location error by $162$ pixels on average. DroTrack outperforms all the high-speed trackers and achieves comparable results in comparison to deep learning trackers. DroTrack offers high frame rates up to 1000 frame per second (fps) with the best location precision, more than a set of state-of-the-art real-time trackers.
CVJan 13, 2020
A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion HandlingJonah Ong, Ba Tuong Vo, Ba Ngu Vo et al.
This paper proposes an online multi-camera multi-object tracker that only requires monocular detector training, independent of the multi-camera configurations, allowing seamless extension/deletion of cameras without retraining effort. The proposed algorithm has a linear complexity in the total number of detections across the cameras, and hence scales gracefully with the number of cameras. It operates in the 3D world frame, and provides 3D trajectory estimates of the objects. The key innovation is a high fidelity yet tractable 3D occlusion model, amenable to optimal Bayesian multi-view multi-object filtering, which seamlessly integrates, into a single Bayesian recursion, the sub-tasks of track management, state estimation, clutter rejection, and occlusion/misdetection handling. The proposed algorithm is evaluated on the latest WILDTRACKS dataset, and demonstrated to work in very crowded scenes on a new dataset.
CVJan 9, 2017
Visual Multiple-Object Tracking for Unknown Clutter RateDu Yong Kim
In multi-object tracking applications, model parameter tuning is a prerequisite for reliable performance. In particular, it is difficult to know statistics of false measurements due to various sensing conditions and changes in the field of views. In this paper we are interested in designing a multi-object tracking algorithm that handles unknown false measurement rate. Recently proposed robust multi-Bernoulli filter is employed for clutter estimation while generalized labeled multi-Bernoulli filter is considered for target tracking. Performance evaluation with real videos demonstrates the effectiveness of the tracking algorithm for real-world scenarios.
CVNov 18, 2016
Online Visual Multi-Object Tracking via Labeled Random Finite Set FilteringDu Yong Kim, Ba-Ngu Vo, Ba-Tuong Vo
This paper proposes an online visual multi-object tracking algorithm using a top-down Bayesian formulation that seamlessly integrates state estimation, track management, clutter rejection, occlusion and mis-detection handling into a single recursion. This is achieved by modeling the multi-object state as labeled random finite set and using the Bayes recursion to propagate the multi-object filtering density forward in time. The proposed filter updates tracks with detections but switches to image data when mis-detection occurs, thereby exploiting the efficiency of detection data and the accuracy of image data. Furthermore the labeled random finite set framework enables the incorporation of prior knowledge that mis-detections of long tracks which occur in the middle of the scene are likely to be due to occlusions. Such prior knowledge can be exploited to improve occlusion handling, especially long occlusions that can lead to premature track termination in on-line multi-object tracking. Tracking performance are compared to state-of-the-art algorithms on well-known benchmark video datasets.