Fahim Shahriar

RO
h-index10
4papers
17citations
Novelty45%
AI Score33

4 Papers

CVOct 6, 2025
General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks

Fahim Shahriar, Cheryl Wang, Alireza Azimi et al.

Goal-conditioned reinforcement learning (GCRL) allows agents to learn diverse objectives using a unified policy. The success of GCRL, however, is contingent on the choice of goal representation. In this work, we propose a mask-based goal representation system that provides object-agnostic visual cues to the agent, enabling efficient learning and superior generalization. In contrast, existing goal representation methods, such as target state images, 3D coordinates, and one-hot vectors, face issues of poor generalization to unseen objects, slow convergence, and the need for special cameras. Masks can be processed to generate dense rewards without requiring error-prone distance calculations. Learning with ground truth masks in simulation, we achieved 99.9% reaching accuracy on training and unseen test objects. Our proposed method can be utilized to perform pick-up tasks with high accuracy, without using any positional information of the target. Moreover, we demonstrate learning from scratch and sim-to-real transfer applications using two different physical robots, utilizing pretrained open vocabulary object detection models for mask generation.

ROMar 17, 2025
Synchronous vs Asynchronous Reinforcement Learning in a Real World Robot

Ali Parsaee, Fahim Shahriar, Chuxin He et al.

In recent times, reinforcement learning (RL) with physical robots has attracted the attention of a wide range of researchers. However, state-of-the-art RL algorithms do not consider that physical environments do not wait for the RL agent to make decisions or updates. RL agents learn by periodically conducting computationally expensive gradient updates. When decision-making and gradient update tasks are carried out sequentially by the RL agent in a physical robot, it significantly increases the agent's response time. In a rapidly changing environment, this increased response time may be detrimental to the performance of the learning agent. Asynchronous RL methods, which separate the computation of decision-making and gradient updates, are a potential solution to this problem. However, only a few comparisons between asynchronous and synchronous RL have been made with physical robots. For this reason, the exact performance benefits of using asynchronous RL methods over synchronous RL methods are still unclear. In this study, we provide a performance comparison between asynchronous and synchronous RL using a physical robotic arm called Franka Emika Panda. Our experiments show that the agents learn faster and attain significantly more returns using asynchronous RL. Our experiments also demonstrate that the learning agent with a faster response time performs better than the agent with a slower response time, even if the agent with a slower response time performs a higher number of gradient updates.

ROJun 29, 2024
Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Gautham Vasan, Yan Wang, Fahim Shahriar et al.

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks. Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards.

CYJan 24, 2021
Automatic Monitoring Social Dynamics During Big Incidences: A Case Study of COVID-19 in Bangladesh

Fahim Shahriar, Md Abul Bashar

Newspapers are trustworthy media where people get the most reliable and credible information compared with other sources. On the other hand, social media often spread rumors and misleading news to get more traffic and attention. Careful characterization, evaluation, and interpretation of newspaper data can provide insight into intrigue and passionate social issues to monitor any big social incidence. This study analyzed a large set of spatio-temporal Bangladeshi newspaper data related to the COVID-19 pandemic. The methodology included volume analysis, topic analysis, automated classification, and sentiment analysis of news articles to get insight into the COVID-19 pandemic in different sectors and regions in Bangladesh over a period of time. This analysis will help the government and other organizations to figure out the challenges that have arisen in society due to this pandemic, what steps should be taken immediately and in the post-pandemic period, how the government and its allies can come together to address the crisis in the future, keeping these problems in mind.