CVMay 8Code
Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action RecognitionTalha Ilyas, Deval Mehta, Zongyuan Ge
Skeleton-based human activity recognition has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recognition as concept-driven first-order logical reasoning over motion primitives. Our framework bridges representation learning and symbolic inference by grounding first-order logic predicates in learnable spatial and temporal motion concepts. Specifically, we employ a standard spatio-temporal skeleton encoder to extract latent motion representations, which are then mapped to interpretable concept predicates via a spatio-temporal concept decoder that explicitly separates pose-centric and dynamics-centric abstractions. These concept predicates are composed through differentiable first-order logic layers, enabling the model to learn human-readable logical rules that govern action semantics. To impose semantic structure on the learned concepts, we align skeleton representations with LLM-derived descriptions of atomic motion primitives, establishing a shared conceptual space for perception and reasoning. Extensive experiments on NTU RGB+D 60/120 and NW-UCLA demonstrate that our approach achieves competitive recognition performance while providing explicit, interpretable explanations grounded in logical structure. Our results highlight neurosymbolic reasoning as an effective paradigm for interpretable spatio-temporal action understanding. Code: https://github.com/Mr-TalhaIlyas/REASON
CVOct 23, 2025
Towards Objective Obstetric Ultrasound Assessment: Contrastive Representation Learning for Fetal Movement DetectionTalha Ilyas, Duong Nhu, Allison Thomas et al.
Accurate fetal movement (FM) detection is essential for assessing prenatal health, as abnormal movement patterns can indicate underlying complications such as placental dysfunction or fetal distress. Traditional methods, including maternal perception and cardiotocography (CTG), suffer from subjectivity and limited accuracy. To address these challenges, we propose Contrastive Ultrasound Video Representation Learning (CURL), a novel self-supervised learning framework for FM detection from extended fetal ultrasound video recordings. Our approach leverages a dual-contrastive loss, incorporating both spatial and temporal contrastive learning, to learn robust motion representations. Additionally, we introduce a task-specific sampling strategy, ensuring the effective separation of movement and non-movement segments during self-supervised training, while enabling flexible inference on arbitrarily long ultrasound recordings through a probabilistic fine-tuning approach. Evaluated on an in-house dataset of 92 subjects, each with 30-minute ultrasound sessions, CURL achieves a sensitivity of 78.01% and an AUROC of 81.60%, demonstrating its potential for reliable and objective FM analysis. These results highlight the potential of self-supervised contrastive learning for fetal movement analysis, paving the way for improved prenatal monitoring and clinical decision-making.
CVMay 17, 2023
CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision AgricultureTalha Ilyas, Dewa Made Sri Arsa, Khubaib Ahmad et al.
The growing demand for precision agriculture necessitates efficient and accurate crop-weed recognition and classification systems. Current datasets often lack the sample size, diversity, and hierarchical structure needed to develop robust deep learning models for discriminating crops and weeds in agricultural fields. Moreover, the similar external structure and phenomics of crops and weeds complicate recognition tasks. To address these issues, we present the CWD30 dataset, a large-scale, diverse, holistic, and hierarchical dataset tailored for crop-weed recognition tasks in precision agriculture. CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset. The dataset's hierarchical taxonomy enables fine-grained classification and facilitates the development of more accurate, robust, and generalizable deep learning models. We conduct extensive baseline experiments to validate the efficacy of the CWD30 dataset. Our experiments reveal that the dataset poses significant challenges due to intra-class variations, inter-class similarities, and data imbalance. Additionally, we demonstrate that minor training modifications like using CWD30 pretrained backbones can significantly enhance model performance and reduce convergence time, saving training resources on several downstream tasks. These challenges provide valuable insights and opportunities for future research in crop-weed recognition. We believe that the CWD30 dataset will serve as a benchmark for evaluating crop-weed recognition algorithms, promoting advancements in precision agriculture, and fostering collaboration among researchers in the field.