Mohammad Moradi

CV
h-index27
5papers
24citations
Novelty41%
AI Score39

5 Papers

CVSep 23, 2024
AIM 2024 Challenge on Video Saliency Prediction: Methods and Results

Andrey Moskalenko, Alexey Bryncev, Dmitry Vatolin et al.

This paper reviews the Challenge on Video Saliency Prediction at AIM 2024. The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences. Saliency maps are widely exploited in various applications, including video compression, quality assessment, visual perception studies, the advertising industry, etc. For this competition, a previously unused large-scale audio-visual mouse saliency (AViMoS) dataset of 1500 videos with more than 70 observers per video was collected using crowdsourced mouse tracking. The dataset collection methodology has been validated using conventional eye-tracking data and has shown high consistency. Over 30 teams registered in the challenge, and there are 7 teams that submitted the results in the final phase. The final phase solutions were tested and ranked by commonly used quality metrics on a private test subset. The results of this evaluation and the descriptions of the solutions are presented in this report. All data, including the private test subset, is made publicly available on the challenge homepage - https://challenges.videoprocessing.ai/challenges/video-saliency-prediction.html.

55.7IVApr 12
Brain-Grasp: Graph-based Saliency Priors for Improved fMRI-based Visual Brain Decoding

Mohammad Moradi, Morteza Moradi, Marco Grassia et al.

Recent progress in brain-guided image generation has improved the quality of fMRI-based reconstructions; however, fundamental challenges remain in preserving object-level structure and semantic fidelity. Many existing approaches overlook the spatial arrangement of salient objects, leading to conceptually inconsistent outputs. We propose a saliency-driven decoding framework that employs graph-informed saliency priors to translate structural cues from brain signals into spatial masks. These masks, together with semantic information extracted from embeddings, condition a diffusion model to guide image regeneration, helping preserve object conformity while maintaining natural scene composition. In contrast to pipelines that invoke multiple diffusion stages, our approach relies on a single frozen model, offering a more lightweight yet effective design. Experiments show that this strategy improves both conceptual alignment and structural similarity to the original stimuli, while also introducing a new direction for efficient, interpretable, and structurally grounded brain decoding.

49.2CVMay 4
Global-Local Feature Decoding with Adapter-Guided SAMv2 for Salient Object Detection

Morteza Moradi, Mohammad Moradi, Simone Palazzo et al.

Salient Object Detection (SOD) remains an essential yet underexplored task in the era of large-scale vision models. Although foundation models like SAM exhibit strong generalization, their potential for SOD is not fully realized, and training or fully fine-tuning them is computationally expensive and prone to overfitting under limited data. To overcome these challenges, we introduce GLASSNet, a Global-Local feature decoding framework that uses SAMv2 as a frozen encoder paired with a lightweight, spatially aware convolutional adapter-reducing learnable encoder parameters by over 97%. To enhance saliency quality, GLASSNet employs a dual-decoder architecture: one decoder captures global, long-range semantics with an expanded receptive field, while the other captures fine local details such as edges and textures. Fusing these complementary cues yields saliency maps that combine global coherence with local precision, producing accurate final masks. Extensive experiments on standard SOD and camouflaged object detection benchmarks show that GLASSNet surpasses state-of-the-art methods, demonstrating the power of frozen foundation models combined with targeted adaptation and global-local decoding.

CVApr 3, 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models

Morteza Moradi, Mohammad Moradi, Francesco Rundo et al.

Recent advancements in video saliency prediction (VSP) have shown promising performance compared to the human visual system, whose emulation is the primary goal of VSP. However, current state-of-the-art models employ spatio-temporal transformers trained on limited amounts of data, hindering generalizability adaptation to downstream tasks. The benefits of vision foundation models present a potential solution to improve the VSP process. However, adapting image foundation models to the video domain presents significant challenges in modeling scene dynamics and capturing temporal information. To address these challenges, and as the first initiative to design a VSP model based on video foundation models, we introduce SalFoM, a novel encoder-decoder video transformer architecture. Our model employs UnMasked Teacher (UMT) as feature extractor and presents a heterogeneous decoder which features a locality-aware spatio-temporal transformer and integrates local and global spatio-temporal information from various perspectives to produce the final saliency map. Our qualitative and quantitative experiments on the challenging VSP benchmark datasets of DHF1K, Hollywood-2 and UCF-Sports demonstrate the superiority of our proposed model in comparison with the state-of-the-art methods.

IRApr 16, 2020
An approach based on Combination of Features for automatic news retrieval

Mohammad Moradi, Elham Ghanbari, Mehrdad Maeen et al.

Nowadays, according to the increasingly increasing information, the importance of its presentation is also increasing. The internet has become one of the main sources of information for users and their favorite topics. It also provides access to more information. Understanding this information is very important for providing the best set of information resources for users. Content providers now need a precise and efficient way to retrieve news with the least human help. Data mining has led to the emergence of new methods for detecting related and unrelated documents. Although the conceptual relationship between documents may be negligible, it is important to provide useful information and relevant content to users. In this paper, a new approach based on the Combination of Features (CoF) for information retrieval operations is introduced. Along with introducing this new approach, we proposed a dataset by identifying the most commonly used keywords in documents and using the most appropriate documents to help them with the abundance of vocabulary. Then, using the proposed approach, techniques of text categorization, evaluation criteria and ranking algorithms, the data were analyzed and examined. The evaluation results show that using the combination of features approach improves the quality and effects on efficient ranking.