Muhammad Hamza

CV
h-index1
6papers
30citations
Novelty37%
AI Score42

6 Papers

CVJan 22, 2023
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza et al. · eth-zurich

We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarizing the action context. TransFusion leverages pre-trained image captioning and vision-language models to extract the action context from past video frames. This action context together with the next video frame is processed by the multimodal fusion module to forecast the next object interaction. Our model enables more efficient end-to-end learning. The large pre-trained language models add common sense and a generalisation capability. Experiments on Ego4D and EPIC-KITCHENS-100 show the effectiveness of our multimodal fusion model. They also highlight the benefits of using language-based context summaries in a task where vision seems to suffice. Our method outperforms state-of-the-art approaches by 40.4% in relative terms in overall mAP on the Ego4D test set. We validate the effectiveness of TransFusion via experiments on EPIC-KITCHENS-100. Video and code are available at https://eth-ait.github.io/transfusion-proj/.

59.0ROJun 2
Toward Gripper-Integrated Active Electrosense for Pre-Contact Sensing in Underwater Soft Grippers

Ahsan Tanveer, Muhammad Hamza, Waqar Hussain Afridi et al.

Underwater manipulation often occurs under degraded visibility due to turbidity, glare, and gripper occlusion, limiting the reliability of vision-based perception during approach and grasping. In such settings, soft grippers are well suited for compliant interaction, but they typically lack an onboard pre-contact cue that can guide approach and closure when vision is unreliable. This extended abstract explores active electrosense as a lightweight sensing modality that can provide a proximity-like signal prior to contact by measuring perturbations of an applied electric field in conductive media. We instrument an octopus-inspired gripper with a discrete electrode layout and record multi-channel sensing voltages using off-the-shelf hardware. Simulation and tank experiments with a suspended conductive sphere show structured, object-dependent changes in the multi-electrode voltage readout relative to empty-water baselines, with detectability varying across excitation of 5 to 20 V and frequencies from 1 mHz to 1 kHz. These findings motivate systematic investigation of gripper-integrated electrosense as a complementary pre-contact cue for underwater soft manipulation.

CVAug 30, 2023
Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement

Muhammad Hamza, Ammar Hawbani, Sami Ul Rehman et al.

Underwater image processing and analysis have been a hotspot of study in recent years, as more emphasis has been focused to underwater monitoring and usage of marine resources. Compared with the open environment, underwater image encountered with more complicated conditions such as light abortion, scattering, turbulence, nonuniform illumination and color diffusion. Although considerable advances and enhancement techniques achieved in resolving these issues, they treat low-frequency information equally across the entire channel, which results in limiting the network's representativeness. We propose a deep learning and feature-attention-based end-to-end network (FA-Net) to solve this problem. In particular, we propose a Residual Feature Attention Block (RFAB), containing the channel attention, pixel attention, and residual learning mechanism with long and short skip connections. RFAB allows the network to focus on learning high-frequency information while skipping low-frequency information on multi-hop connections. The channel and pixel attention mechanism considers each channel's different features and the uneven distribution of haze over different pixels in the image. The experimental results shows that the FA-Net propose by us provides higher accuracy, quantitatively and qualitatively and superiority to previous state-of-the-art methods.

SIOct 10, 2025
Celebrity Profiling on Short Urdu Text using Twitter Followers' Feed

Muhammad Hamza, Rizwan Jafar

Social media has become an essential part of the digital age, serving as a platform for communication, interaction, and information sharing. Celebrities are among the most active users and often reveal aspects of their personal and professional lives through online posts. Platforms such as Twitter provide an opportunity to analyze language and behavior for understanding demographic and social patterns. Since followers frequently share linguistic traits and interests with the celebrities they follow, textual data from followers can be used to predict celebrity demographics. However, most existing research in this field has focused on English and other high-resource languages, leaving Urdu largely unexplored. This study applies modern machine learning and deep learning techniques to the problem of celebrity profiling in Urdu. A dataset of short Urdu tweets from followers of subcontinent celebrities was collected and preprocessed. Multiple algorithms were trained and compared, including Logistic Regression, Support Vector Machines, Random Forests, Convolutional Neural Networks, and Long Short-Term Memory networks. The models were evaluated using accuracy, precision, recall, F1-score, and cumulative rank (cRank). The best performance was achieved for gender prediction with a cRank of 0.65 and an accuracy of 0.65, followed by moderate results for age, profession, and fame prediction. These results demonstrate that follower-based linguistic features can be effectively leveraged using machine learning and neural approaches for demographic prediction in Urdu, a low-resource language.

CVSep 20, 2025
Person Identification from Egocentric Human-Object Interactions using 3D Hand Pose

Muhammad Hamza, Danish Hamid, Muhammad Tahir Akram

Human-Object Interaction Recognition (HOIR) and user identification play a crucial role in advancing augmented reality (AR)-based personalized assistive technologies. These systems are increasingly being deployed in high-stakes, human-centric environments such as aircraft cockpits, aerospace maintenance, and surgical procedures. This research introduces I2S (Interact2Sign), a multi stage framework designed for unobtrusive user identification through human object interaction recognition, leveraging 3D hand pose analysis in egocentric videos. I2S utilizes handcrafted features extracted from 3D hand poses and per forms sequential feature augmentation: first identifying the object class, followed by HOI recognition, and ultimately, user identification. A comprehensive feature extraction and description process was carried out for 3D hand poses, organizing the extracted features into semantically meaningful categories: Spatial, Frequency, Kinematic, Orientation, and a novel descriptor introduced in this work, the Inter-Hand Spatial Envelope (IHSE). Extensive ablation studies were conducted to determine the most effective combination of features. The optimal configuration achieved an impressive average F1-score of 97.52% for user identification, evaluated on a bimanual object manipulation dataset derived from the ARCTIC and H2O datasets. I2S demonstrates state-of-the-art performance while maintaining a lightweight model size of under 4 MB and a fast inference time of 0.1 seconds. These characteristics make the proposed framework highly suitable for real-time, on-device authentication in security-critical, AR-based systems.

LGDec 16, 2021
Forecasting sales with Bayesian networks: a case study of a supermarket product in the presence of promotions

Muhammad Hamza, Mahdi Abolghasemi, Abraham Oshni Alvandi

Sales forecasting is the prerequisite for a lot of managerial decisions such as production planning, material resource planning and budgeting in the supply chain. Promotions are one of the most important business strategies that are often used to boost sales. While promotions are attractive for generating demand, it is often difficult to forecast demand in their presence. In the past few decades, several quantitative models have been developed to forecast sales including statistical and machine learning models. However, these methods may not be adequate to account for all the internal and external factors that may impact sales. As a result, qualitative models have been adopted along with quantitative methods as consulting experts has been proven to improve forecast accuracy by providing contextual information. Such models are being used extensively to account for factors that can lead to a rapid change in sales, such as during promotions. In this paper, we aim to use Bayesian Networks to forecast promotional sales where a combination of factors such as price, type of promotions, and product location impacts sales. We choose to develop a BN model because BN models essentially have the capability to combine various qualitative and quantitative factors with causal forms, making it an attractive tool for sales forecasting during promotions. This can be used to adjust a company's promotional strategy in the context of this case study. We gather sales data for a particular product from a retailer that sells products in Australia. We develop a Bayesian Network for this product and validate our results by empirical analysis. This paper confirms that BNs can be effectively used to forecast sales, especially during promotions. In the end, we provide some research avenues for using BNs in forecasting sales.