CVMar 4Code
CLIP-Guided Multi-Task Regression for Multi-View Plant PhenotypingSimon Warmers, Muhammad Zawish, Fayaz Ali Dharejo et al.
Modeling plant growth dynamics plays a central role in modern agricultural research. However, learning robust predictors from multi-view plant imagery remains challenging due to strong viewpoint redundancy and viewpoint-dependent appearance changes. We propose a level-aware vision language framework that jointly predicts plant age and leaf count using a single multi-task model built on CLIP embeddings. Our method aggregates rotational views into angle-invariant representations and conditions visual features on lightweight text priors encoding viewpoint level for stable prediction under incomplete or unordered inputs. On the GroMo25 benchmark, our approach reduces mean age MAE from 7.74 to 3.91 and mean leaf-count MAE from 5.52 to 3.08 compared to the GroMo baseline, corresponding to improvements of 49.5% and 44.2%, respectively. The unified formulation simplifies the pipeline by replacing the conventional dual-model setup while improving robustness to missing views. The models and code is available at: https://github.com/SimonWarmers/CLIP-MVP
AIAug 23, 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research TrendsMuhammad Zawish, Fayaz Ali Dharejo, Sunder Ali Khowaja et al.
Since Facebook was renamed Meta, a lot of attention, debate, and exploration have intensified about what the Metaverse is, how it works, and the possible ways to exploit it. It is anticipated that Metaverse will be a continuum of rapidly emerging technologies, usecases, capabilities, and experiences that will make it up for the next evolution of the Internet. Several researchers have already surveyed the literature on artificial intelligence (AI) and wireless communications in realizing the Metaverse. However, due to the rapid emergence and continuous evolution of technologies, there is a need for a comprehensive and in-depth survey of the role of AI, 6G, and the nexus of both in realizing the immersive experiences of Metaverse. Therefore, in this survey, we first introduce the background and ongoing progress in augmented reality (AR), virtual reality (VR), mixed reality (MR) and spatial computing, followed by the technical aspects of AI and 6G. Then, we survey the role of AI in the Metaverse by reviewing the state-of-the-art in deep learning, computer vision, and Edge AI to extract the requirements of 6G in Metaverse. Next, we investigate the promising services of B5G/6G towards Metaverse, followed by identifying the role of AI in 6G networks and 6G networks for AI in support of Metaverse applications, and the need for sustainability in Metaverse. Finally, we enlist the existing and potential applications, usecases, and projects to highlight the importance of progress in the Metaverse. Moreover, in order to provide potential research directions to researchers, we underline the challenges, research gaps, and lessons learned identified from the literature review of the aforementioned technologies.
AIMar 12, 2022
Towards On-Device AI and Blockchain for 6G enabled Agricultural Supply-chain ManagementMuhammad Zawish, Nouman Ashraf, Rafay Iqbal Ansari et al.
6G envisions artificial intelligence (AI) powered solutions for enhancing the quality-of-service (QoS) in the network and to ensure optimal utilization of resources. In this work, we propose an architecture based on the combination of unmanned aerial vehicles (UAVs), AI and blockchain for agricultural supply-chain management with the purpose of ensuring traceability, transparency, tracking inventories and contracts. We propose a solution to facilitate on-device AI by generating a roadmap of models with various resource-accuracy trade-offs. A fully convolutional neural network (FCN) model is used for biomass estimation through images captured by the UAV. Instead of a single compressed FCN model for deployment on UAV, we motivate the idea of iterative pruning to provide multiple task-specific models with various complexities and accuracy. To alleviate the impact of flight failure in a 6G enabled dynamic UAV network, the proposed model selection strategy will assist UAVs to update the model based on the runtime resource requirements.
LGAug 26, 2022
Complexity-Driven CNN Compression for Resource-constrained Edge AIMuhammad Zawish, Steven Davy, Lizy Abraham
Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT)-enabled network edge has realized edge intelligence in several applications such as smart agriculture, smart hospitals, and smart factories by enabling low-latency and computational efficiency. However, deploying state-of-the-art Convolutional Neural Networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their large number of parameters and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured do not consider the different underlying nature of complexities being exhibited by convolutional layers and follow a training-pruning-retraining pipeline, which results in additional computational overhead. In this work, we propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs. Unlike typical methods, our proposed complexity-driven algorithm selects a particular layer for filter-pruning based on its contribution to overall network complexity. We follow a procedure that directly trains the pruned model and avoids the computationally complex ranking and fine-tuning steps. Moreover, we define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs. Our results show the competitive performance of our approach in terms of accuracy and acceleration. Lastly, we present a trade-off between different resources and accuracy which can be helpful for developers in making the right decisions in resource-constrained IoT environments.
AIOct 24, 2024
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific PromptsDanyal Aftab, Steven Davy
Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt effectiveness. Our approach, Tailored LLaMA initially employs structural pruning to reduce the model sizes from 7B to 5B and 4B parameters. Subsequently, it applies a carefully designed prompt specific to the task and utilizes the LoRA method to accelerate the fine-tuning process. Moreover, fine-tuning a model pruned by 50\% for less than one hour restores the mean accuracy of classification tasks to 95.68\% at a 20\% compression ratio and to 86.54\% at a 50\% compression ratio through few-shot learning with 50 shots. Our validation of Tailored LLaMA on these two pruned variants demonstrates that even when compressed to 50\%, the models maintain over 65\% of the baseline model accuracy in few-shot classification and generation tasks. These findings highlight the efficacy of our tailored approach in maintaining high performance with significantly reduced model sizes.
CVApr 17, 2024
Energy-Efficient Uncertainty-Aware Biomass Composition Prediction at the EdgeMuhammad Zawish, Paul Albert, Flavio Esposito et al.
Clover fixates nitrogen from the atmosphere to the ground, making grass-clover mixtures highly desirable to reduce external nitrogen fertilization. Herbage containing clover additionally promotes higher food intake, resulting in higher milk production. Herbage probing however remains largely unused as it requires a time-intensive manual laboratory analysis. Without this information, farmers are unable to perform localized clover sowing or take targeted fertilization decisions. Deep learning algorithms have been proposed with the goal to estimate the dry biomass composition from images of the grass directly in the fields. The energy-intensive nature of deep learning however limits deployment to practical edge devices such as smartphones. This paper proposes to fill this gap by applying filter pruning to reduce the energy requirement of existing deep learning solutions. We report that although pruned networks are accurate on controlled, high-quality images of the grass, they struggle to generalize to real-world smartphone images that are blurry or taken from challenging angles. We address this challenge by training filter-pruned models using a variance attenuation loss so they can predict the uncertainty of their predictions. When the uncertainty exceeds a threshold, we re-infer using a more accurate unpruned model. This hybrid approach allows us to reduce energy consumption while retaining a high accuracy. We evaluate our algorithm on two datasets: the GrassClover and the Irish clover using an NVIDIA Jetson Nano edge device. We find that we reduce energy reduction with respect to state-of-the-art solutions by 50% on average with only 4% accuracy loss.
CVOct 11, 2024
SpikeBottleNet: Spike-Driven Feature Compression Architecture for Edge-Cloud Co-InferenceMaruf Hassan, Steven Davy
Edge-cloud co-inference enables efficient deep neural network (DNN) deployment by splitting the architecture between an edge device and cloud server, crucial for resource-constraint edge devices. This approach requires balancing on-device computations and communication costs, often achieved through compressed intermediate feature transmission. Conventional DNN architectures require continuous data processing and floating point activations, leading to considerable energy consumption and increased feature sizes, thus raising transmission costs. This challenge motivates exploring binary, event-driven activations using spiking neural networks (SNNs), known for their extreme energy efficiency. In this research, we propose SpikeBottleNet, a novel architecture for edge-cloud co-inference systems that integrates a spiking neuron model to significantly reduce energy consumption on edge devices. A key innovation of our study is an intermediate feature compression technique tailored for SNNs for efficient feature transmission. This technique leverages a split computing approach to strategically place encoder-decoder bottleneck units within complex deep architectures like ResNet and MobileNet. Experimental results demonstrate that SpikeBottleNet achieves up to 256x bit compression in the final convolutional layer of ResNet, with minimal accuracy loss (0.16%). Additionally, our approach enhances edge device energy efficiency by up to 144x compared to the baseline BottleNet, making it ideal for resource-limited edge devices.
CVSep 23, 2025
NeuCODEX: Edge-Cloud Co-Inference with Spike-Driven Compression and Dynamic Early-ExitMaurf Hassan, Steven Davy, Muhammad Zawish et al.
Spiking Neural Networks (SNNs) offer significant potential for enabling energy-efficient intelligence at the edge. However, performing full SNN inference at the edge can be challenging due to the latency and energy constraints arising from fixed and high timestep overheads. Edge-cloud co-inference systems present a promising solution, but their deployment is often hindered by high latency and feature transmission costs. To address these issues, we introduce NeuCODEX, a neuromorphic co-inference architecture that jointly optimizes both spatial and temporal redundancy. NeuCODEX incorporates a learned spike-driven compression module to reduce data transmission and employs a dynamic early-exit mechanism to adaptively terminate inference based on output confidence. We evaluated NeuCODEX on both static images (CIFAR10 and Caltech) and neuromorphic event streams (CIFAR10-DVS and N-Caltech). To demonstrate practicality, we prototyped NeuCODEX on ResNet-18 and VGG-16 backbones in a real edge-to-cloud testbed. Our proposed system reduces data transfer by up to 2048x and edge energy consumption by over 90%, while reducing end-to-end latency by up to 3x compared to edge-only inference, all with a negligible accuracy drop of less than 2%. In doing so, NeuCODEX enables practical, high-performance SNN deployment in resource-constrained environments.
CVAug 27, 2025
WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-ResolutionFayaz Ali, Muhammad Zawish, Steven Davy et al.
Transformers have demonstrated promising performance in computer vision tasks, including image super-resolution (SR). The quadratic computational complexity of window self-attention mechanisms in many transformer-based SR methods forces the use of small, fixed windows, limiting the receptive field. In this paper, we propose a new approach by embedding the wavelet transform within a hierarchical transformer framework, called (WaveHiT-SR). First, using adaptive hierarchical windows instead of static small windows allows to capture features across different levels and greatly improve the ability to model long-range dependencies. Secondly, the proposed model utilizes wavelet transforms to decompose images into multiple frequency subbands, allowing the network to focus on both global and local features while preserving structural details. By progressively reconstructing high-resolution images through hierarchical processing, the network reduces computational complexity without sacrificing performance. The multi-level decomposition strategy enables the network to capture fine-grained information in lowfrequency components while enhancing high-frequency textures. Through extensive experimentation, we confirm the effectiveness and efficiency of our WaveHiT-SR. Our refined versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light deliver cutting-edge SR results, achieving higher efficiency with fewer parameters, lower FLOPs, and faster speeds.