Hamed Rahimi

AI
h-index49
14papers
160citations
Novelty44%
AI Score46

14 Papers

IRFeb 3, 2023
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

Hamed Rahimi, Hubert Naacke, Camelia Constantin et al.

This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms to provide a modular framework for discovering evolving topics. ANTM maintains the temporal continuity of evolving topics by extracting time-aware features from documents using advanced pre-trained Large Language Models (LLMs) and employing an overlapping sliding window algorithm for sequential document clustering. This overlapping sliding window algorithm identifies a different number of topics within each time frame and aligns semantically similar document clusters across time periods. This process captures emerging and fading trends across different periods and allows for a more interpretable representation of evolving topics. Experiments on four distinct datasets show that ANTM outperforms probabilistic dynamic topic models in terms of topic coherence and diversity metrics. Moreover, it improves the scalability and flexibility of dynamic topic models by being accessible and adaptable to different types of algorithms. Additionally, a Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data.

IRJun 4, 2023
ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives

Hamed Rahimi, Hubert Naacke, Camelia Constantin et al.

This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM is based on dynamic topic modeling and dynamic graph embedding techniques that explore the dynamics of content and citations of documents within a scientific corpus. ATEM explores a new notion of contextual emergence for the discovery of emerging interdisciplinary research topics based on the dynamics of citation links in topic clusters. Our experiments show that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles.

AIFeb 15, 2025Code
USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

Hamed Rahimi, Adil Bahaj, Mouad Abrini et al.

The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360°, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360° socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.

ROMar 17
Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy

Adrien Jacquet Crétides, Mouad Abrini, Hamed Rahimi et al.

Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, legibility allows a human observer a better understanding of the robot's actions, increasing safety and trust. However, these behaviors result in sub-optimal and exaggerated trajectories that are redundant in low-ambiguity scenarios where the robot's goal is already obvious. To address this trade-off, we propose Style-Conditioned Diffusion Policy (SCDP), a modular framework that constrains the trajectory generation of a pre-trained diffusion model toward either legibility or efficiency based on the environment's configuration. Our method utilizes a post-training pipeline that freezes the base policy and trains a lightweight scene encoder and conditioning predictor to modulate the diffusion process. At inference time, an ambiguity detection module activates the appropriate conditioning, prioritizing expressive motion only for ambiguous goals and reverting to efficient paths otherwise. We evaluate SCDP on manipulation and navigation tasks, and results show that it enhances legibility in ambiguous settings while preserving optimal efficiency when legibility is unnecessary, all without retraining the base policy.

ROJan 27
HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs

Jeanne Malécot, Hamed Rahimi, Jeanne Cattoni et al.

Existing human-robot interaction systems often lack mechanisms for sustained personalization and dynamic adaptation in multi-user environments, limiting their effectiveness in real-world deployments. We present HARMONI, a multimodal personalization framework that leverages large language models to enable socially assistive robots to manage long-term multi-user interactions. The framework integrates four key modules: (i) a perception module that identifies active speakers and extracts multimodal input; (ii) a world modeling module that maintains representations of the environment and short-term conversational context; (iii) a user modeling module that updates long-term speaker-specific profiles; and (iv) a generation module that produces contextually grounded and ethically informed responses. Through extensive evaluation and ablation studies on four datasets, as well as a real-world scenario-driven user-study in a nursing home environment, we demonstrate that HARMONI supports robust speaker identification, online memory updating, and ethically aligned personalization, outperforming baseline LLM-driven approaches in user modeling accuracy, personalization quality, and user satisfaction.

HCApr 27
IntentVLM: Open-Vocabulary Intention Recognition through Forward-Inverse Modeling with Video-Language Models

Hamed Rahimi, Clemence Grislain, Adrien Jacquet Cretides et al.

Improving the effectiveness of human-robot interaction requires social robots to accurately infer human goals through robust intention understanding. This challenge is particularly critical in multimodal settings, where agents must integrate heterogeneous signals including text, visual cues to form a coherent interpretation of user intent. This paper presents IntentVLM, a novel two-stage video-language framework designed for open-vocabulary human intention recognition. The approach is inspired by forward-inverse modeling in cognitive science by decomposing intention understanding into goal candidate generation followed by structured inference through selection, effectively reducing hallucinations in latent reasoning. Evaluated on the IntentQA and Inst-IT Bench datasets, IntentVLM achieves state-of-the-art results with up to 80% accuracy, notably surpassing the baseline performance by 30% and matches human performance. Our findings demonstrate that this structured reasoning approach enhances open-vocabulary intention understanding without catastrophic forgetting, offering a robust foundation for human-centered robotics.

HCApr 2, 2025
Reasoning LLMs for User-Aware Multimodal Conversational Agents

Hamed Rahimi, Jeanne Cattoni, Meriem Beghili et al.

Personalization in social robotics is critical for fostering effective human-robot interactions, yet systems often face the cold start problem, where initial user preferences or characteristics are unavailable. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent that addresses this challenge through dynamic user profiling and model initiation. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models (VLMs) to initialize user profiles from multimodal inputs, enabling personalized interactions from the first encounter. Leveraging a Retrieval-Augmented Generation (RAG) architecture, the system dynamically refines user representations within an inherent CoT process, ensuring contextually relevant and adaptive responses. Evaluations on the ElderlyTech-VQA Bench demonstrate significant improvements in ROUGE-1 (+23.2%), ROUGE-2 (+0.6%), and ROUGE-L (+8%) F1 scores over state-of-the-art baselines, with ablation studies underscoring the impact of reasoning model size on performance. Human evaluations further validate the framework's efficacy, particularly for elderly users, where tailored responses enhance engagement and trust. Ethical considerations, including privacy preservation and bias mitigation, are rigorously discussed and addressed to ensure responsible deployment.

AIApr 28, 2025
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Mouad Abrini, Omri Abend, Dina Acklin et al. · cambridge

This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

CLApr 16, 2025
Gauging Overprecision in LLMs: An Empirical Study

Adil Bahaj, Hamed Rahimi, Mohamed Chetouani et al.

Recently, overconfidence in large language models (LLMs) has garnered considerable attention due to its fundamental importance in quantifying the trustworthiness of LLM generation. However, existing approaches prompt the \textit{black box LLMs} to produce their confidence (\textit{verbalized confidence}), which can be subject to many biases and hallucinations. Inspired by a different aspect of overconfidence in cognitive science called \textit{overprecision}, we designed a framework for its study in black box LLMs. This framework contains three main phases: 1) generation, 2) refinement and 3) evaluation. In the generation phase we prompt the LLM to generate answers to numerical questions in the form of intervals with a certain level of confidence. This confidence level is imposed in the prompt and not required for the LLM to generate as in previous approaches. We use various prompting techniques and use the same prompt multiple times to gauge the effects of randomness in the generation process. In the refinement phase, answers from the previous phase are refined to generate better answers. The LLM answers are evaluated and studied in the evaluation phase to understand its internal workings. This study allowed us to gain various insights into LLM overprecision: 1) LLMs are highly uncalibrated for numerical tasks 2) there is no correlation between the length of the interval and the imposed confidence level, which can be symptomatic of a a) lack of understanding of the concept of confidence or b) inability to adjust self-confidence by following instructions, {3) LLM numerical precision differs depending on the task, scale of answer and prompting technique 4) Refinement of answers doesn't improve precision in most cases. We believe this study offers new perspectives on LLM overconfidence and serves as a strong baseline for overprecision in LLMs.

AIFeb 15, 2025
Demographic User Modeling for Social Robotics with Multimodal Pre-trained Models

Hamed Rahimi, Mouad Abrini, Mahdi Khoramshahi et al.

This paper investigates the performance of multimodal pre-trained models in user profiling tasks based on visual-linguistic demographic data. These models are critical for adapting to the needs and preferences of human users in social robotics, thereby providing personalized responses and enhancing interaction quality. First, we introduce two datasets specifically curated to represent demographic characteristics derived from user facial images. Next, we evaluate the performance of a prominent contrastive multimodal pre-trained model, CLIP, on these datasets, both in its out-of-the-box state and after fine-tuning. Initial results indicate that CLIP performs suboptimal in matching images to demographic descriptions without fine-tuning. Although fine-tuning significantly enhances its predictive capacity, the model continues to exhibit limitations in effectively generalizing subtle demographic nuances. To address this, we propose adopting a masked image modeling strategy to improve generalization and better capture subtle demographic attributes. This approach offers a pathway for enhancing demographic sensitivity in multimodal user modeling tasks.

CLMay 23, 2023
Contextualized Topic Coherence Metrics

Hamed Rahimi, Jacob Louis Hoover, David Mimno et al.

The recent explosion in work on neural topic modeling has been criticized for optimizing automated topic evaluation metrics at the expense of actual meaningful topic identification. But human annotation remains expensive and time-consuming. We propose LLM-based methods inspired by standard human topic evaluations, in a family of metrics called Contextualized Topic Coherence (CTC). We evaluate both a fully automated version as well as a semi-automated CTC that allows human-centered evaluation of coherence while maintaining the efficiency of automated methods. We evaluate CTC relative to five other metrics on six topic models and find that it outperforms automated topic coherence methods, works well on short documents, and is not susceptible to meaningless but high-scoring topics.

AIJul 13, 2021
Q-SMASH: Q-Learning-based Self-Adaptation of Human-Centered Internet of Things

Hamed Rahimi, Iago Felipe Trentin, Fano Ramparany et al.

As the number of Human-Centered Internet of Things (HCIoT) applications increases, the self-adaptation of its services and devices is becoming a fundamental requirement for addressing the uncertainties of the environment in decision-making processes. Self-adaptation of HCIoT aims to manage run-time changes in a dynamic environment and to adjust the functionality of IoT objects in order to achieve desired goals during execution. SMASH is a semantic-enabled multi-agent system for self-adaptation of HCIoT that autonomously adapts IoT objects to uncertainties of their environment. SMASH addresses the self-adaptation of IoT applications only according to the human values of users, while the behavior of users is not addressed. This article presents Q-SMASH: a multi-agent reinforcement learning-based approach for self-adaptation of IoT objects in human-centered environments. Q-SMASH aims to learn the behaviors of users along with respecting human values. The learning ability of Q-SMASH allows it to adapt itself to the behavioral change of users and make more accurate decisions in different states and situations.

AIMay 31, 2021
SMASH: a Semantic-enabled Multi-agent Approach for Self-adaptation of Human-centered IoT

Hamed Rahimi, Iago Felipe Trentin, Fano Ramparany et al.

Nowadays, IoT devices have an enlarging scope of activities spanning from sensing, computing to acting and even more, learning, reasoning and planning. As the number of IoT applications increases, these objects are becoming more and more ubiquitous. Therefore, they need to adapt their functionality in response to the uncertainties of their environment to achieve their goals. In Human-centered IoT, objects and devices have direct interactions with human beings and have access to online contextual information. Self-adaptation of such applications is a crucial subject that needs to be addressed in a way that respects human goals and human values. Hence, IoT applications must be equipped with self-adaptation techniques to manage their run-time uncertainties locally or in cooperation with each other. This paper presents SMASH: a multi-agent approach for self-adaptation of IoT applications in human-centered environments. In this paper, we have considered the Smart Home as the case study of smart environments. SMASH agents are provided with a 4-layer architecture based on the BDI agent model that integrates human values with goal-reasoning, planning, and acting. It also takes advantage of a semantic-enabled platform called Home'In to address interoperability issues among non-identical agents and devices with heterogeneous protocols and data formats. This approach is compared with the literature and is validated by developing a scenario as the proof of concept. The timely responses of SMASH agents show the feasibility of the proposed approach in human-centered environments.

CYApr 16, 2020
Road Quality Analysis Based on Cognitive Internet of Vehicles (CIoV)

Hamed Rahimi, Dhayananth Dharmalingam

This research proposal aims to use cognitive methods to analyze the quality of roads based on the new proposed technology called Cognitive Internet of Vehicles (CIoV). By using Big Data corresponding to the collected data of autonomous vehicles, we can apply cognitive analytics to a huge amount of transportation data. This process can help us to create valuable information such as road quality from an immense volume of meaningless data. In this proposal, we are going to focus on the quality of roads for various business and commercial purposes. The proposed system can be used as an additional service of autonomous car companies or as a mobile application for ordinary usages. As a result, this system can reduce the usage of resources such as energy consumption of autonomous vehicles. Moreover, this technology benefits the next-generation of self-driving applications to improve their QoS.