Branislav Kusý

h-index35

18papers

272citations

Novelty36%

AI Score30

Ranked #138,762 of 194,257 authors (top 71%)#45,673 in CV (top 77%)

18 Papers

4.6LGAug 1, 2022

A Real-time Edge-AI System for Reef Surveys

Yang Li, Jiajun Liu, Brano Kusy et al.

Crown-of-Thorn Starfish (COTS) outbreaks are a major cause of coral loss on the Great Barrier Reef (GBR) and substantial surveillance and control programs are ongoing to manage COTS populations to ecologically sustainable levels. In this paper, we present a comprehensive real-time machine learning-based underwater data collection and curation system on edge devices for COTS monitoring. In particular, we leverage the power of deep learning-based object detection techniques, and propose a resource-efficient COTS detector that performs detection inferences on the edge device to assist marine experts with COTS identification during the data collection phase. The preliminary results show that several strategies for improving computational efficiency (e.g., batch-wise processing, frame skipping, model input size) can be combined to run the proposed detection model on edge hardware with low resource consumption and low information loss.

6.8CVMar 2, 2023Code

Image Labels Are All You Need for Coarse Seagrass Segmentation

Scarlett Raine, Ross Marchant, Brano Kusy et al.

Seagrass meadows serve as critical carbon sinks, but estimating the amount of carbon they store requires knowledge of the seagrass species present. Underwater and surface vehicles equipped with machine learning algorithms can help to accurately estimate the composition and extent of seagrass meadows at scale. However, previous approaches for seagrass detection and classification have required supervision from patch-level labels. In this paper, we reframe seagrass classification as a weakly supervised coarse segmentation problem where image-level labels are used during training (25 times fewer labels compared to patch-level labeling) and patch-level outputs are obtained at inference time. To this end, we introduce SeaFeats, an architecture that uses unsupervised contrastive pre-training and feature similarity, and SeaCLIP, a model that showcases the effectiveness of large language models as a supervisory signal in domain-specific applications. We demonstrate that an ensemble of SeaFeats and SeaCLIP leads to highly robust performance. Our method outperforms previous approaches that require patch-level labels on the multi-species 'DeepSeagrass' dataset by 6.8% (absolute) for the class-weighted F1 score, and by 12.1% (absolute) for the seagrass presence/absence F1 score on the 'Global Wetlands' dataset. We also present two case studies for real-world deployment: outlier detection on the Global Wetlands dataset, and application of our method on imagery collected by the FloatyBoat autonomous surface vehicle.

7.3CVSep 13, 2022

A Capsule Network for Hierarchical Multi-Label Image Classification

Khondaker Tasrif Noor, Antonio Robles-Kelly, Brano Kusy

Image classification is one of the most important areas in computer vision. Hierarchical multi-label classification applies when a multi-class image classification problem is arranged into smaller ones based upon a hierarchy or taxonomy. Thus, hierarchical classification modes generally provide multiple class predictions on each instance, whereby these are expected to reflect the structure of image classes as related to one another. In this paper, we propose a multi-label capsule network (ML-CapsNet) for hierarchical classification. Our ML-CapsNet predicts multiple image classes based on a hierarchical class-label tree structure. To this end, we present a loss function that takes into account the multi-label predictions of the network. As a result, the training approach for our ML-CapsNet uses a coarse to fine paradigm while maintaining consistency with the structure in the classification levels in the label-hierarchy. We also perform experiments using widely available datasets and compare the model with alternatives elsewhere in the literature. In our experiments, our ML-CapsNet yields a margin of improvement with respect to these alternative methods.

3.9CVAug 22, 2023Code

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

Bingqing Zhang, Sen Wang, Yifan Liu et al.

Current video object detection (VOD) models often encounter issues with over-aggregation due to redundant aggregation strategies, which perform feature aggregation on every frame. This results in suboptimal performance and increased computational complexity. In this work, we propose an image-level Object Detection Difficulty (ODD) metric to quantify the difficulty of detecting objects in a given image. The derived ODD scores can be used in the VOD process to mitigate over-aggregation. Specifically, we train an ODD predictor as an auxiliary head of a still-image object detector to compute the ODD score for each image based on the discrepancies between detection results and ground-truth bounding boxes. The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process. Comprehensive experiments demonstrate that, when utilized for selecting global reference frames, ODD-VOD consistently enhances the accuracy of Global-frame-based VOD models. When employed for acceleration, ODD-VOD consistently improves the frames per second (FPS) by an average of 73.3% across 8 different VOD models without sacrificing accuracy. When combined, ODD-VOD attains state-of-the-art performance when competing with many VOD methods in both accuracy and speed. Our work represents a significant advancement towards making VOD more practical for real-world applications.

3.9CVOct 26, 2023Code

Understanding the Effects of Projectors in Knowledge Distillation

Yudong Chen, Sen Wang, Jiajun Liu et al.

Conventionally, during the knowledge distillation process (e.g. feature distillation), an additional projector is often required to perform feature transformation due to the dimension mismatch between the teacher and the student networks. Interestingly, we discovered that even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance. In addition, projectors even improve logit distillation if we add them to the architecture too. Inspired by these surprising findings and the general lack of understanding of the projectors in the knowledge distillation process from existing literature, this paper investigates the implicit role that projectors play but so far have been overlooked. Our empirical study shows that the student with a projector (1) obtains a better trade-off between the training accuracy and the testing accuracy compared to the student without a projector when it has the same feature dimensions as the teacher, (2) better preserves its similarity to the teacher beyond shallow and numeric resemblance, from the view of Centered Kernel Alignment (CKA), and (3) avoids being over-confident as the teacher does at the testing phase. Motivated by the positive effects of projectors, we propose a projector ensemble-based feature distillation method to further improve distillation performance. Despite the simplicity of the proposed strategy, empirical results from the evaluation of classification tasks on benchmark datasets demonstrate the superior classification performance of our method on a broad range of teacher-student pairs and verify from the aspects of CKA and model calibration that the student's features are of improved quality with the projector ensemble design.

1.4CVMar 9, 2021Code

DeepSeagrass Dataset

Scarlett Raine, Ross Marchant, Peyman Moghadam et al.

We introduce a dataset of seagrass images collected by a biologist snorkelling in Moreton Bay, Queensland, Australia, as described in our publication: arXiv:2009.09924. The images are labelled at the image-level by collecting images of the same morphotype in a folder hierarchy. We also release pre-trained models and training codes for detection and classification of seagrass species at the patch level at https://github.com/csiro-robotics/deepseagrass.

4.2CVSep 18, 2020Code

Multi-species Seagrass Detection and Classification from Underwater Images

Scarlett Raine, Ross Marchant, Peyman Moghadam et al.

Underwater surveys conducted using divers or robots equipped with customized camera payloads can generate a large number of images. Manual review of these images to extract ecological data is prohibitive in terms of time and cost, thus providing strong incentive to automate this process using machine learning solutions. In this paper, we introduce a multi-species detector and classifier for seagrasses based on a deep convolutional neural network (achieved an overall accuracy of 92.4%). We also introduce a simple method to semi-automatically label image patches and therefore minimize manual labelling requirement. We describe and release publicly the dataset collected in this study as well as the code and pre-trained models to replicate our experiments at: https://github.com/csiro-robotics/deepseagrass

8.7CVApr 15, 2024Code

Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Scarlett Raine, Ross Marchant, Brano Kusy et al.

Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain experts. Point label propagation is a technique that uses existing images labeled with sparse points to create augmented ground truth data, which can be used to train a semantic segmentation model. In this work, we show that recent advances in large foundation models facilitate the creation of augmented ground truth masks using only features extracted by the denoised version of the DINOv2 foundation model and K-Nearest Neighbors (KNN), without any pre-training. For images with extremely sparse labels, we present a labeling method based on human-in-the-loop principles, which greatly enhances annotation efficiency: in the case that there are 5 point labels per image, our human-in-the-loop method outperforms the prior state-of-the-art by 14.2% for pixel accuracy and 19.7% for mIoU; and by 8.9% and 18.3% if there are 10 point labels. When human-in-the-loop labeling is not available, using the denoised DINOv2 features with a KNN still improves on the prior state-of-the-art by 2.7% for pixel accuracy and 5.8% for mIoU (5 grid points). On the semantic segmentation task, we outperform the prior state-of-the-art by 8.8% for pixel accuracy and by 13.5% for mIoU when only 5 point labels are used for point label propagation. Additionally, we perform a comprehensive study into the impacts of the point label placement style and the number of points on the point label propagation quality, and make several recommendations for improving the efficiency of labeling images with points.

7.9LGNov 20, 2024

LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Jiawei Hu, Hong Jia, Mahbub Hassan et al.

We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts.

5.0CVMay 26, 2023

CVB: A Video Dataset of Cattle Visual Behaviors

Ali Zia, Renuka Sharma, Reza Arablouei et al.

Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort. Moreover, we convert CVB to the atomic visual action (AVA) format and train and evaluate the popular SlowFast action recognition model on it. The associated preliminary results confirm that we can localize the cattle and recognize their frequently occurring behaviors with confidence. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data.

8.1CVFeb 27, 2022Code

Point Label Aware Superpixels for Multi-species Segmentation of Underwater Imagery

Scarlett Raine, Ross Marchant, Brano Kusy et al.

Monitoring coral reefs using underwater vehicles increases the range of marine surveys and availability of historical ecological data by collecting significant quantities of images. Analysis of this imagery can be automated using a model trained to perform semantic segmentation, however it is too costly and time-consuming to densely label images for training supervised models. In this letter, we leverage photo-quadrat imagery labeled by ecologists with sparse point labels. We propose a point label aware method for propagating labels within superpixel regions to obtain augmented ground truth for training a semantic segmentation model. Our point label aware superpixel method utilizes the sparse point labels, and clusters pixels using learned features to accurately generate single-species segments in cluttered, complex coral images. Our method outperforms prior methods on the UCSD Mosaics dataset by 3.62% for pixel accuracy and 8.35% for mean IoU for the label propagation task, while reducing computation time reported by previous approaches by 76%. We train a DeepLabv3+ architecture and outperform state-of-the-art for semantic segmentation by 2.91% for pixel accuracy and 9.65% for mean IoU on the UCSD Mosaics dataset and by 4.19% for pixel accuracy and 14.32% mean IoU for the Eilat dataset.

6.5CVNov 29, 2021

The CSIRO Crown-of-Thorn Starfish Detection Dataset

Jiajun Liu, Brano Kusy, Ross Marchant et al.

Crown-of-Thorn Starfish (COTS) outbreaks are a major cause of coral loss on the Great Barrier Reef (GBR) and substantial surveillance and control programs are underway in an attempt to manage COTS populations to ecologically sustainable levels. We release a large-scale, annotated underwater image dataset from a COTS outbreak area on the GBR, to encourage research on Machine Learning and AI-driven technologies to improve the detection, monitoring, and management of COTS populations at reef scale. The dataset is released and hosted in a Kaggle competition that challenges the international Machine Learning community with the task of COTS detection from these underwater images.

3.1LGOct 17, 2021

Exploring Deep Neural Networks on Edge TPU

Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Brano Kusy et al.

This paper explores the performance of Google's Edge TPU on feed forward neural networks. We consider Edge TPU as a hardware platform and explore different architectures of deep neural network classifiers, which traditionally has been a challenge to run on resource constrained edge devices. Based on the use of a joint-time-frequency data representation, also known as spectrogram, we explore the trade-off between classification performance and the energy consumed for inference. The energy efficiency of Edge TPU is compared with that of widely-used embedded CPU ARM Cortex-A53. Our results quantify the impact of neural network architectural specifications on the Edge TPU's performance, guiding decisions on the TPU's optimal operating point, where it can provide high classification accuracy with minimal energy consumption. Also, our evaluations highlight the crossover in performance between the Edge TPU and Cortex-A53, depending on the neural network specifications. Based on our analysis, we provide a decision chart to guide decisions on platform selection based on the model parameters and context.

7.2LGNov 6, 2020

Deep Learning-based Cattle Activity Classification Using Joint Time-frequency Data Representation

Seyedeh Faezeh Hosseini Noorbin, Siamak Layeghy, Brano Kusy et al.

Automated cattle activity classification allows herders to continuously monitor the health and well-being of livestock, resulting in increased quality and quantity of beef and dairy products. In this paper, a sequential deep neural network is used to develop a behavioural model and to classify cattle behaviour and activities. The key focus of this paper is the exploration of a joint time-frequency domain representation of the sensor data, which is provided as the input to the neural network classifier. Our exploration is based on a real-world data set with over 3 million samples, collected from sensors with a tri-axial accelerometer, magnetometer and gyroscope, attached to collar tags of 10 dairy cows and collected over a one month period. The key results of this paper is that the joint time-frequency data representation, even when used in conjunction with a relatively basic neural network classifier, can outperform the best cattle activity classifiers reported in the literature. With a more systematic exploration of neural network classifier architectures and hyper-parameters, there is potential for even further improvements. Finally, we demonstrate that the time-frequency domain data representation allows us to efficiently trade-off a large reduction of model size and computational complexity for a very minor reduction in classification accuracy. This shows the potential for our classification approach to run on resource-constrained embedded and IoT devices.

3.0CVAug 1, 2016

Fast and robust pushbroom hyperspectral imaging via DMD-based scanning

Reza Arablouei, Ethan Goan, Stephen Gensemer et al.

We describe a new pushbroom hyperspectral imaging device that has no macro moving part. The main components of the proposed hyperspectral imager are a digital micromirror device (DMD), a CMOS image sensor with no filter as the spectral sensor, a CMOS color (RGB) image sensor as the auxiliary image sensor, and a diffraction grating. Using the image sensor pair, the device can simultaneously capture hyperspectral data as well as RGB images of the scene. The RGB images captured by the auxiliary image sensor can facilitate geometric co-registration of the hyperspectral image slices captured by the spectral sensor. In addition, the information discernible from the RGB images can lead to capturing the spectral data of only the regions of interest within the scene. The proposed hyperspectral imaging architecture is cost-effective, fast, and robust. It also enables a trade-off between resolution and speed. We have built an initial prototype based on the proposed design. The prototype can capture a hyperspectral image datacube with a spatial resolution of 192x192 pixels and a spectral resolution of 500 bands in less than thirty seconds.

2.1ROMar 4, 2015

Autonomous surveillance for biosecurity

Raja Jurdak, Alberto Elfes, Branislav Kusy et al.

The global movement of people and goods has increased the risk of biosecurity threats and their potential to incur large economic, social, and environmental costs. Conventional manual biosecurity surveillance methods are limited by their scalability in space and time. This article focuses on autonomous surveillance systems, comprising sensor networks, robots, and intelligent algorithms, and their applicability to biosecurity threats. We discuss the spatial and temporal attributes of autonomous surveillance technologies and map them to three broad categories of biosecurity threat: (i) vector-borne diseases; (ii) plant pests; and (iii) aquatic pests. Our discussion reveals a broad range of opportunities to serve biosecurity needs through autonomous surveillance.

1.1LGFeb 18, 2015

Temporal Embedding in Convolutional Neural Networks for Robust Learning of Abstract Snippets

Jiajun Liu, Kun Zhao, Brano Kusy et al.

The prediction of periodical time-series remains challenging due to various types of data distortions and misalignments. Here, we propose a novel model called Temporal embedding-enhanced convolutional neural Network (TeNet) to learn repeatedly-occurring-yet-hidden structural elements in periodical time-series, called abstract snippets, for predicting future changes. Our model uses convolutional neural networks and embeds a time-series with its potential neighbors in the temporal domain for aligning it to the dominant patterns in the dataset. The model is robust to distortions and misalignments in the temporal domain and demonstrates strong prediction power for periodical time-series. We conduct extensive experiments and discover that the proposed model shows significant and consistent advantages over existing methods on a variety of data modalities ranging from human mobility to household power consumption records. Empirical results indicate that the model is robust to various factors such as number of samples, variance of data, numerical ranges of data etc. The experiments also verify that the intuition behind the model can be generalized to multiple data types and applications and promises significant improvement in prediction performances across the datasets studied.

1.4LGSep 5, 2014

Novel Methods for Activity Classification and Occupany Prediction Enabling Fine-grained HVAC Control

Rajib Rana, Brano Kusy, Josh Wall et al.

Much of the energy consumption in buildings is due to HVAC systems, which has motivated several recent studies on making these systems more energy- efficient. Occupancy and activity are two important aspects, which need to be correctly estimated for optimal HVAC control. However, state-of-the-art methods to estimate occupancy and classify activity require infrastructure and/or wearable sensors which suffers from lower acceptability due to higher cost. Encouragingly, with the advancement of the smartphones, these are becoming more achievable. Most of the existing occupancy estimation tech- niques have the underlying assumption that the phone is always carried by its user. However, phones are often left at desk while attending meeting or other events, which generates estimation error for the existing phone based occupancy algorithms. Similarly, in the recent days the emerging theory of Sparse Random Classifier (SRC) has been applied for activity classification on smartphone, however, there are rooms to improve the on-phone process- ing. We propose a novel sensor fusion method which offers almost 100% accuracy for occupancy estimation. We also propose an activity classifica- tion algorithm, which offers similar accuracy as of the state-of-the-art SRC algorithms while offering 50% reduction in processing.