SEMar 6, 2023
Requirements Engineering Framework for Human-centered Artificial Intelligence Software SystemsKhlood Ahmad, Mohamed Abdelrazek, Chetan Arora et al.
[Context] Artificial intelligence (AI) components used in building software solutions have substantially increased in recent years. However, many of these solutions focus on technical aspects and ignore critical human-centered aspects. [Objective] Including human-centered aspects during requirements engineering (RE) when building AI-based software can help achieve more responsible, unbiased, and inclusive AI-based software solutions. [Method] In this paper, we present a new framework developed based on human-centered AI guidelines and a user survey to aid in collecting requirements for human-centered AI-based software. We provide a catalog to elicit these requirements and a conceptual model to present them visually. [Results] The framework is applied to a case study to elicit and model requirements for enhancing the quality of 360 degree~videos intended for virtual reality (VR) users. [Conclusion] We found that our proposed approach helped the project team fully understand the human-centered needs of the project to deliver. Furthermore, the framework helped to understand what requirements need to be captured at the initial stages against later stages in the engineering process of AI-based software.
CVAug 13, 2025Code
TOTNet: Occlusion-Aware Temporal Tracking for Robust Ball Detection in Sports VideosHao Xu, Arbind Agrahari Baniya, Sam Wells et al.
Robust ball tracking under occlusion remains a key challenge in sports video analysis, affecting tasks like event detection and officiating. We present TOTNet, a Temporal Occlusion Tracking Network that leverages 3D convolutions, visibility-weighted loss, and occlusion augmentation to improve performance under partial and full occlusions. Developed in collaboration with Paralympics Australia, TOTNet is designed for real-world sports analytics. We introduce TTA, a new occlusion-rich table tennis dataset collected from professional-level Paralympic matches, comprising 9,159 samples with 1,996 occlusion cases. Evaluated on four datasets across tennis, badminton, and table tennis, TOTNet significantly outperforms prior state-of-the-art methods, reducing RMSE from 37.30 to 7.19 and improving accuracy on fully occluded frames from 0.63 to 0.80. These results demonstrate TOTNets effectiveness for offline sports analytics in fast-paced scenarios. Code and data access:\href{https://github.com/AugustRushG/TOTNet}{AugustRushG/TOTNet}.
MMJun 3, 2025
Omnidirectional Video Super-Resolution using Deep LearningArbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund et al.
Omnidirectional Videos (or 360° videos) are widely used in Virtual Reality (VR) to facilitate immersive and interactive viewing experiences. However, the limited spatial resolution in 360° videos does not allow for each degree of view to be represented with adequate pixels, limiting the visual quality offered in the immersive experience. Deep learning Video Super-Resolution (VSR) techniques used for conventional videos could provide a promising software-based solution; however, these techniques do not tackle the distortion present in equirectangular projections of 360° video signals. An additional obstacle is the limited availability of 360° video datasets for study. To address these issues, this paper creates a novel 360° Video Dataset (360VDS) with a study of the extensibility of conventional VSR models to 360° videos. This paper further proposes a novel deep learning model for 360° Video Super-Resolution (360° VSR), called Spherical Signal Super-resolution with a Proportioned Optimisation (S3PO). S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment. With a purpose-built feature extractor and a novel loss function addressing spherical distortion, S3PO outperforms most state-of-the-art conventional VSR models and 360°~specific super-resolution models on 360° video datasets. A step-wise ablation study is presented to understand and demonstrate the impact of the chosen architectural sub-components, targeted training and optimisation.
IVJun 3, 2025
A Survey of Deep Learning Video Super-ResolutionArbind Agrahari Baniya, Tsz-Kwan Lee, Peter Eklund et al.
Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.
CVMay 6, 2025
Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and ChallengesHao Xu, Arbind Agrahari Baniya, Sam Well et al.
Video event detection has become a cornerstone of modern sports analytics, powering automated performance evaluation, content generation, and tactical decision-making. Recent advances in deep learning have driven progress in related tasks such as Temporal Action Localization (TAL), which detects extended action segments; Action Spotting (AS), which identifies a representative timestamp; and Precise Event Spotting (PES), which pinpoints the exact frame of an event. Although closely connected, their subtle differences often blur the boundaries between them, leading to confusion in both research and practical applications. Furthermore, prior surveys either address generic video event detection or broader sports video tasks, but largely overlook the unique temporal granularity and domain-specific challenges of event spotting. In addition, most existing sports video surveys focus on elite-level competitions while neglecting the wider community of everyday practitioners. This survey addresses these gaps by: (i) clearly delineating TAL, AS, and PES and their respective use cases; (ii) introducing a structured taxonomy of state of the art approaches including temporal modeling strategies, multimodal frameworks, and data-efficient pipelines tailored for AS and PES; and (iii) critically assessing benchmark datasets and evaluation protocols, highlighting limitations such as reliance on broadcast quality footage and metrics that over reward permissive multilabel predictions. By synthesizing current research and exposing open challenges, this work provides a comprehensive foundation for developing temporally precise, generalizable, and practically deployable sports event detection systems for both the research and industry communities.
CVJul 8, 2025
Hyperspectral Anomaly Detection Methods: A Survey and Comparative StudyAayushma Pant, Arbind Agrahari Baniya, Tsz-Kwan Lee et al.
Hyperspectral images are high-dimensional datasets comprising hundreds of contiguous spectral bands, enabling detailed analysis of materials and surfaces. Hyperspectral anomaly detection (HAD) refers to the technique of identifying and locating anomalous targets in such data without prior information about a hyperspectral scene or target spectrum. This technology has seen rapid advancements in recent years, with applications in agriculture, defence, military surveillance, and environmental monitoring. Despite this significant progress, existing HAD methods continue to face challenges such as high computational complexity, sensitivity to noise, and limited generalisation across diverse datasets. This study presents a comprehensive comparison of various HAD techniques, categorising them into statistical models, representation-based methods, classical machine learning approaches, and deep learning models. We evaluated these methods across 17 benchmarking datasets using different performance metrics, such as ROC, AUC, and separability map to analyse detection accuracy, computational efficiency, their strengths, limitations, and directions for future research. Our findings highlight that deep learning models achieved the highest detection accuracy, while statistical models demonstrated exceptional speed across all datasets. This survey aims to provide valuable insights for researchers and practitioners working to advance the field of hyperspectral anomaly detection methods.
LGJan 21, 2024
Enabling clustering algorithms to detect clusters of varying densities through scale-invariant data preprocessingSunil Aryal, Jonathan R. Wells, Arbind Agrahari Baniya et al.
In this paper, we show that preprocessing data using a variant of rank transformation called 'Average Rank over an Ensemble of Sub-samples (ARES)' makes clustering algorithms robust to data representation and enable them to detect varying density clusters. Our empirical results, obtained using three most widely used clustering algorithms-namely KMeans, DBSCAN, and DP (Density Peak)-across a wide range of real-world datasets, show that clustering after ARES transformation produces better and more consistent results.
LGNov 8, 2021
A Novel Data Pre-processing Technique: Making Data Mining Robust to Different Units and Scales of MeasurementArbind Agrahari Baniya, Sunil Aryal, Santosh KC
Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and effective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task specific? outcome across various algorithms and data sets. In addition to this, it results in better or competitive outcome most of the time compared to the most widely used min-max normalisation and the traditional rank transformation.
LGSep 27, 2019
Improved histogram-based anomaly detector with the extended principal component featuresSunil Aryal, Arbind Agrahari Baniya, KC Santosh
In this era of big data, databases are growing rapidly in terms of the number of records. Fast automatic detection of anomalous records in these massive databases is a challenging task. Traditional distance based anomaly detectors are not applicable in these massive datasets. Recently, a simple but extremely fast anomaly detector using one-dimensional histograms has been introduced. The anomaly score of a data instance is computed as the product of the probability mass of histograms in each dimensions where it falls into. It is shown to produce competitive results compared to many state-of-the-art methods in many datasets. Because it assumes data features are independent of each other, it results in poor detection accuracy when there is correlation between features. To address this issue, we propose to increase the feature size by adding more features based on principal components. Our results show that using the original input features together with principal components improves the detection accuracy of histogram-based anomaly detector significantly without compromising much in terms of run-time.