Homayoun Najjaran

CV
h-index34
45papers
507citations
Novelty39%
AI Score54

45 Papers

AIJun 2
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

Joel Sol, Homayoun Najjaran

As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge for evaluating LLM-based agents in cooperative multi-agent environments. The environment has several key features such as decentralized control, partial observability and long-horizon decision making. SMAC-Talk includes a natural language communication channel which is used to probe agent coordination and trust. We use this communication channel to construct different evaluation scenarios, including settings with an embedded deceptive communicator that tries to disrupt and deceive allies through communication alone. We provide three agents for benchmarking using 4 models from the Qwen3.5 family and study how reasoning structure, memory and model scale affect coordination between agents. We release SMAC-Talk as an open benchmark to support the research community in developing and evaluating LLM agents in cooperative multi-agent settings.

LGJul 7, 2023
Reinforcement and Deep Reinforcement Learning-based Solutions for Machine Maintenance Planning, Scheduling Policies, and Optimization

Oluwaseyi Ogunfowora, Homayoun Najjaran

Systems and machines undergo various failure modes that result in machine health degradation, so maintenance actions are required to restore them back to a state where they can perform their expected functions. Since maintenance tasks are inevitable, maintenance planning is essential to ensure the smooth operations of the production system and other industries at large. Maintenance planning is a decision-making problem that aims at developing optimum maintenance policies and plans that help reduces maintenance costs, extend asset life, maximize their availability, and ultimately ensure workplace safety. Reinforcement learning is a data-driven decision-making algorithm that has been increasingly applied to develop dynamic maintenance plans while leveraging the continuous information from condition monitoring of the system and machine states. By leveraging the condition monitoring data of systems and machines with reinforcement learning, smart maintenance planners can be developed, which is a precursor to achieving a smart factory. This paper presents a literature review on the applications of reinforcement and deep reinforcement learning for maintenance planning and optimization problems. To capture the common ideas without losing touch with the uniqueness of each publication, taxonomies used to categorize the systems were developed, and reviewed publications were highlighted, classified, and summarized based on these taxonomies. Adopted methodologies, findings, and well-defined interpretations of the reviewed studies were summarized in graphical and tabular representations to maximize the utility of the work for both researchers and practitioners. This work also highlights the research gaps, key insights from the literature, and areas for future work.

AIAug 11, 2023
Learning Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding

Jaehoon Chung, Jamil Fayyad, Younes Al Younes et al.

Multi-agent pathfinding (MAPF) is a critical field in many large-scale robotic applications, often being the fundamental step in multi-agent systems. The increasing complexity of MAPF in complex and crowded environments, however, critically diminishes the effectiveness of existing solutions. In contrast to other studies that have either presented a general overview of the recent advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL) within multi-agent system settings independently, our work presented in this review paper focuses on highlighting the integration of DRL-based approaches in MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation metrics and providing comprehensive clarification on these metrics. Finally, our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding to address current challenges in MAPF. Our objective is to assist readers in gaining insight into the current research direction, providing unified metrics for comparing different MAPF algorithms and expanding their knowledge of model-based DRL to address the existing challenges in MAPF.

LGOct 4, 2023
Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions

Maziyar Khadivi, Todd Charter, Marjan Yaghoubi et al.

Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.

CVJul 21, 2023
Model Compression Methods for YOLOv5: A Review

Mohammad Jani, Jamil Fayyad, Younes Al-Younes et al.

Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.

LGJun 24, 2022
Synthesizing Rolling Bearing Fault Samples in New Conditions: A framework based on a modified CGAN

Maryam Ahang, Masoud Jalayer, Ardeshir Shojaeinasab et al.

Bearings are one of the vital components of rotating machines that are prone to unexpected faults. Therefore, bearing fault diagnosis and condition monitoring is essential for reducing operational costs and downtime in numerous industries. In various production conditions, bearings can be operated under a range of loads and speeds, which causes different vibration patterns associated with each fault type. Normal data is ample as systems usually work in desired conditions. On the other hand, fault data is rare, and in many conditions, there is no data recorded for the fault classes. Accessing fault data is crucial for developing data-driven fault diagnosis tools that can improve both the performance and safety of operations. To this end, a novel algorithm based on Conditional Generative Adversarial Networks (CGANs) is introduced. Trained on the normal and fault data on any actual fault conditions, this algorithm generates fault data from normal data of target conditions. The proposed method is validated on a real-world bearing dataset, and fault data are generated for different conditions. Several state-of-the-art classifiers and visualization models are implemented to evaluate the quality of the synthesized data. The results demonstrate the efficacy of the proposed algorithm.

ROJul 29, 2023
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Zengjie Zhang, Jayden Hong, Amir Soufi Enayati et al.

Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.

CVJul 15, 2023
Anomaly Detection in Automated Fibre Placement: Learning with Data Limitations

Assef Ghamisi, Todd Charter, Li Ji et al.

Conventional defect detection systems in Automated Fibre Placement (AFP) typically rely on end-to-end supervised learning, necessitating a substantial number of labelled defective samples for effective training. However, the scarcity of such labelled data poses a challenge. To overcome this limitation, we present a comprehensive framework for defect detection and localization in Automated Fibre Placement. Our approach combines unsupervised deep learning and classical computer vision algorithms, eliminating the need for labelled data or manufacturing defect samples. It efficiently detects various surface issues while requiring fewer images of composite parts for training. Our framework employs an innovative sample extraction method leveraging AFP's inherent symmetry to expand the dataset. By inputting a depth map of the fibre layup surface, we extract local samples aligned with each composite strip (tow). These samples are processed through an autoencoder, trained on normal samples for precise reconstructions, highlighting anomalies through reconstruction errors. Aggregated values form an anomaly map for insightful visualization. The framework employs blob detection on this map to locate manufacturing defects. The experimental findings reveal that despite training the autoencoder with a limited number of images, our proposed method exhibits satisfactory detection accuracy and accurately identifies defect locations. Our framework demonstrates comparable performance to existing methods, while also offering the advantage of detecting all types of anomalies without relying on an extensive labelled dataset of defects.

CVJul 11, 2023
Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D Reconstruction

Sara Hatami Gazani, Matthew Tucsok, Iraj Mantegh et al.

UAV-based intelligent data acquisition for 3D reconstruction and monitoring of infrastructure has experienced an increasing surge of interest due to recent advancements in image processing and deep learning-based techniques. View planning is an essential part of this task that dictates the information capture strategy and heavily impacts the quality of the 3D model generated from the captured data. Recent methods have used prior knowledge or partial reconstruction of the target to accomplish view planning for active reconstruction; the former approach poses a challenge for complex or newly identified targets while the latter is computationally expensive. In this work, we present Bag-of-Views (BoV), a fully appearance-based model used to assign utility to the captured views for both offline dataset refinement and online next-best-view (NBV) planning applications targeting the task of 3D reconstruction. With this contribution, we also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction. Through experiments which pair a BoV-based reinforcement learning model with VPT, we demonstrate the efficacy of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.

LGAug 19, 2023
A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case

Oluwaseyi Ogunfowora, Homayoun Najjaran

In recent times, Large Language Models (LLMs) have captured a global spotlight and revolutionized the field of Natural Language Processing. One of the factors attributed to the effectiveness of LLMs is the model architecture used for training, transformers. Transformer models excel at capturing contextual features in sequential data since time series data are sequential, transformer models can be leveraged for more efficient time series data prediction. The field of prognostics is vital to system health management and proper maintenance planning. A reliable estimation of the remaining useful life (RUL) of machines holds the potential for substantial cost savings. This includes avoiding abrupt machine failures, maximizing equipment usage, and serving as a decision support system (DSS). This work proposed an encoder-transformer architecture-based framework for multivariate time series prediction for a prognostics use case. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset for the remaining useful life prediction task. To effectively transfer the knowledge and application of transformers from the natural language domain to time series, three model-specific experiments were conducted. Also, to enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed for the first time in this work, it was compared with the sliding window method, and it led to a large improvement in the performance of the encoder transformer model. Finally, the performance of the proposed encoder-transformer model was evaluated on the test dataset and compared with the results from 13 other state-of-the-art (SOTA) models in the literature and it outperformed them all with an average performance increase of 137.65% over the next best model across all the datasets.

ROApr 12, 2023
Facilitating Sim-to-real by Intrinsic Stochasticity of Real-Time Simulation in Reinforcement Learning for Robot Manipulation

Ram Dershan, Amir M. Soufi Enayati, Zengjie Zhang et al.

Simulation is essential to reinforcement learning (RL) before implementation in the real world, especially for safety-critical applications like robot manipulation. Conventionally, RL agents are sensitive to the discrepancies between the simulation and the real world, known as the sim-to-real gap. The application of domain randomization, a technique used to fill this gap, is limited to the imposition of heuristic-randomized models. {We investigate the properties of intrinsic stochasticity of real-time simulation (RT-IS) of off-the-shelf simulation software and its potential to improve RL performance. This improvement includes a higher tolerance to noise and model imprecision and superiority to conventional domain randomization in terms of ease of use and automation. Firstly, we conduct analytical studies to measure the correlation of RT-IS with the utilization of computer hardware and validate its comparability with the natural stochasticity of a physical robot. Then, we exploit the RT-IS feature in the training of an RL agent. The simulation and physical experiment results verify the feasibility and applicability of RT-IS to robust agent training for robot manipulation tasks. The RT-IS-powered RL agent outperforms conventional agents on robots with modeling uncertainties. RT-IS requires less heuristic randomization, is not task-dependent, and achieves better generalizability than the conventional domain-randomization-powered agents. Our findings provide a new perspective on the sim-to-real problem in practical applications like robot manipulation tasks.

AIJul 7, 2024
A Review of AI and Machine Learning Contribution in Predictive Business Process Management (Process Enhancement and Process Improvement Approaches)

Mostafa Abbasi, Rahnuma Islam Nishat, Corey Bond et al.

Purpose- The significance of business processes has fostered a close collaboration between academia and industry. Moreover, the business landscape has witnessed continuous transformation, closely intertwined with technological advancements. Our main goal is to offer researchers and process analysts insights into the latest developments concerning Artificial Intelligence (AI) and Machine Learning (ML) to optimize their processes in an organization and identify research gaps and future directions in the field. Design/methodology/approach- In this study, we perform a systematic review of academic literature to investigate the integration of AI/ML in business process management (BPM). We categorize the literature according to the BPM life-cycle and employ bibliometric and objective-oriented methodology, to analyze related papers. Findings- In business process management and process map, AI/ML has made significant improvements using operational data on process metrics. These developments involve two distinct stages: (1) process enhancement, which emphasizes analyzing process information and adding descriptions to process models, and (2) process improvement, which focuses on redesigning processes based on insights derived from analysis. Research limitations/implications- While this review paper serves to provide an overview of different approaches for addressing process-related challenges, it does not delve deeply into the intricacies of fine-grained technical details of each method. This work focuses on recent papers conducted between 2010 and 2024. Originality/value- This paper adopts a pioneering approach by conducting an extensive examination of the integration of AI/ML techniques across the entire process management lifecycle. Additionally, it presents groundbreaking research and introduces AI/ML-enabled integrated tools, further enhancing the insights for future research.

HCJul 21, 2023
Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration

Debasmita Mukherjee, Ritwik Singhai, Homayoun Najjaran

Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one.

ROApr 12, 2023
Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives

Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati et al.

Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.

CVApr 25, 2023
Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion

Sara Hatami Gazani, Fardad Dadboud, Miodrag Bolic et al.

Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.

ROApr 12, 2023
Sample-Efficient Reinforcement Learning with Symmetry-Guided Demonstrations for Robotic Manipulation

Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta et al.

Reinforcement learning (RL) suffers from low sample efficiency, particularly in high-dimensional continuous state-action spaces of complex robotic manipulation tasks. RL performance can improve by leveraging prior knowledge, even when demonstrations are limited and collected from simplified environments. To address this, we define General Abstract Symmetry (GAS) for aggregating demonstrations from symmetrical abstract partitions of the robot environment. We introduce Demo-EASE, a novel training framework using a dual-buffer architecture that stores both demonstrations and RL-generated experiences. Demo-EASE improves sample efficiency through symmetry-guided demonstrations and behavior cloning, enabling strong initialization and balanced exploration-exploitation. Demo-EASE is compatible with both on-policy and off-policy RL algorithms, supporting various training regimes. We evaluate our framework in three simulation experiments using a Kinova Gen3 robot with joint-space control within PyBullet. Our results show that Demo-EASE significantly accelerates convergence and improves final performance compared to standard RL baselines, demonstrating its potential for efficient real-world robotic manipulation learning.

CVJul 30, 2025Code
Object Recognition Datasets and Challenges: A Review

Aria Salari, Abtin Djavadifar, Xiangrui Liu et al.

Object recognition is among the fundamental tasks in the computer vision applications, paving the path for all other image understanding operations. In every stage of progress in object recognition research, efforts have been made to collect and annotate new datasets to match the capacity of the state-of-the-art algorithms. In recent years, the importance of the size and quality of datasets has been intensified as the utility of the emerging deep network techniques heavily relies on training data. Furthermore, datasets lay a fair benchmarking means for competitions and have proved instrumental to the advancements of object recognition research by providing quantifiable benchmarks for the developed models. Taking a closer look at the characteristics of commonly-used public datasets seems to be an important first step for data-driven and machine learning researchers. In this survey, we provide a detailed analysis of datasets in the highly investigated object recognition areas. More than 160 datasets have been scrutinized through statistics and descriptions. Additionally, we present an overview of the prominent object recognition benchmarks and competitions, along with a description of the metrics widely adopted for evaluation purposes in the computer vision community. All introduced datasets and challenges can be found online at github.com/AbtinDjavadifar/ORDC.

IVDec 12, 2023Code
Empirical Validation of Conformal Prediction for Trustworthy Skin Lesions Classification

Jamil Fayyad, Shadi Alijani, Homayoun Najjaran

Background and objective: Uncertainty quantification is a pivotal field that contributes to realizing reliable and robust systems. It becomes instrumental in fortifying safe decisions by providing complementary information, particularly within high-risk applications. existing studies have explored various methods that often operate under specific assumptions or necessitate substantial modifications to the network architecture to effectively account for uncertainties. The objective of this paper is to study Conformal Prediction, an emerging distribution-free uncertainty quantification technique, and provide a comprehensive understanding of the advantages and limitations inherent in various methods within the medical imaging field. Methods: In this study, we developed Conformal Prediction, Monte Carlo Dropout, and Evidential Deep Learning approaches to assess uncertainty quantification in deep neural networks. The effectiveness of these methods is evaluated using three public medical imaging datasets focused on detecting pigmented skin lesions and blood cell types. Results: The experimental results demonstrate a significant enhancement in uncertainty quantification with the utilization of the Conformal Prediction method, surpassing the performance of the other two methods. Furthermore, the results present insights into the effectiveness of each uncertainty method in handling Out-of-Distribution samples from domain-shifted datasets. Our code is available at: Conclusions: Our conclusion highlights a robust and consistent performance of conformal prediction across diverse testing conditions. This positions it as the preferred choice for decision-making in safety-critical applications.

LGApr 10
A Hybrid Intelligent Framework for Uncertainty-Aware Condition Monitoring of Industrial Systems

Maryam Ahang, Todd Charter, Masoud Jalayer et al.

Hybrid approaches that combine data-driven learning with physics-based insight have shown promise for improving the reliability of industrial condition monitoring. This work develops a hybrid condition monitoring framework that integrates primary sensor measurements, lagged temporal features, and physics-informed residuals derived from nominal surrogate models. Two hybrid integration strategies are examined. The first is a feature-level fusion approach that augments the input space with residual and temporal information. The second is a model-level ensemble approach in which machine learning classifiers trained on different feature types are combined at the decision level. Both hybrid approaches of the condition monitoring framework are evaluated on a continuous stirred-tank reactor (CSTR) benchmark using several machine learning models and ensemble configurations. Both feature-level and model-level hybridization improve diagnostic accuracy relative to single-source baselines, with the best model-level ensemble achieving a 2.9\% improvement over the best baseline ensemble. To assess predictive reliability, conformal prediction is applied to quantify coverage, prediction-set size, and abstention behavior. The results show that hybrid integration enhances uncertainty management, producing smaller and well-calibrated prediction sets at matched coverage levels. These findings demonstrate that lightweight physics-informed residuals, temporal augmentation, and ensemble learning can be combined effectively to improve both accuracy and decision reliability in nonlinear industrial systems.

LGAug 15, 2024
Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning via MetaGradient-based Hyperparameter Tuning

Homayoun Honari, Amir Mehdi Soufi Enayati, Mehran Ghafarian Tamizi et al.

Safe Reinforcement Learning (Safe RL) is one of the prevalently studied subcategories of trial-and-error-based methods with the intention to be deployed on real-world systems. In safe RL, the goal is to maximize reward performance while minimizing constraints, often achieved by setting bounds on constraint functions and utilizing the Lagrangian method. However, deploying Lagrangian-based safe RL in real-world scenarios is challenging due to the necessity of threshold fine-tuning, as imprecise adjustments may lead to suboptimal policy convergence. To mitigate this challenge, we propose a unified Lagrangian-based model-free architecture called Meta Soft Actor-Critic Lagrangian (Meta SAC-Lag). Meta SAC-Lag uses meta-gradient optimization to automatically update the safety-related hyperparameters. The proposed method is designed to address safe exploration and threshold adjustment with minimal hyperparameter tuning requirement. In our pipeline, the inner parameters are updated through the conventional formulation and the hyperparameters are adjusted using the meta-objectives which are defined based on the updated parameters. Our results show that the agent can reliably adjust the safety performance due to the relatively fast convergence rate of the safety threshold. We evaluate the performance of Meta SAC-Lag in five simulated environments against Lagrangian baselines, and the results demonstrate its capability to create synergy between parameters, yielding better or competitive results. Furthermore, we conduct a real-world experiment involving a robotic arm tasked with pouring coffee into a cup without spillage. Meta SAC-Lag is successfully trained to execute the task, while minimizing effort constraints.

CVJul 13, 2024
Sim-to-Real Domain Adaptation for Deformation Classification

Joel Sol, Jamil Fayyad, Shadi Alijani et al.

Deformation detection is vital for enabling accurate assessment and prediction of structural changes in materials, ensuring timely and effective interventions to maintain safety and integrity. Automating deformation detection through computer vision is crucial for efficient monitoring, but it faces significant challenges in creating a comprehensive dataset of both deformed and non-deformed objects, which can be difficult to obtain in many scenarios. In this paper, we introduce a novel framework for generating controlled synthetic data that simulates deformed objects. This approach allows for the realistic modeling of object deformations under various conditions. Our framework integrates an intelligent adapter network that facilitates sim-to-real domain adaptation, enhancing classification results without requiring real data from deformed objects. We conduct experiments on domain adaptation and classification tasks and demonstrate that our framework improves sim-to-real classification results compared to simulation baseline.

CVNov 1, 2024Code
Debiasify: Self-Distillation for Unsupervised Bias Mitigation

Nourhan Bayasi, Jamil Fayyad, Ghassan Hamarneh et al.

Simplicity bias poses a significant challenge in neural networks, often leading models to favor simpler solutions and inadvertently learn decision rules influenced by spurious correlations. This results in biased models with diminished generalizability. While many current approaches depend on human supervision, obtaining annotations for various bias attributes is often impractical. To address this, we introduce Debiasify, a novel self-distillation approach that requires no prior knowledge about the nature of biases. Our method leverages a new distillation loss to transfer knowledge within the network, from deeper layers containing complex, highly-predictive features to shallower layers with simpler, attribute-conditioned features in an unsupervised manner. This enables Debiasify to learn robust, debiased representations that generalize effectively across diverse biases and datasets, improving both worst-group performance and overall accuracy. Extensive experiments on computer vision and medical imaging benchmarks demonstrate the effectiveness of our approach, significantly outperforming previous unsupervised debiasing methods (e.g., a 10.13% improvement in worst-group accuracy for Wavy Hair classification in CelebA) and achieving comparable or superior performance to supervised approaches. Our code is publicly available at the following link: Debiasify.

CVApr 3, 2024Code
Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking

Navid Mahdian, Mohammad Jani, Amir M. Soufi Enayati et al.

Multi-object tracking (MOT) is a prominent task in computer vision with application in autonomous driving, responsible for the simultaneous tracking of multiple object trajectories. Detection-based multi-object tracking (DBT) algorithms detect objects using an independent object detector and predict the imminent location of each target. Conventional prediction methods in DBT utilize Kalman Filter(KF) to extrapolate the target location in the upcoming frames by supposing a constant velocity motion model. These methods are especially hindered in autonomous driving applications due to dramatic camera motion or unavailable detections. Such limitations lead to tracking failures manifested by numerous identity switches and disrupted trajectories. In this paper, we introduce a novel KF-based prediction module called the Ego-motion Aware Target Prediction (EMAP) module by focusing on the integration of camera motion and depth information with object motion models. Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter. This reformulation enables us to reject the disturbances caused by camera motion and maximizes the reliability of the object motion model. We integrate our module with four state-of-the-art base MOT algorithms, namely OC-SORT, Deep OC-SORT, ByteTrack, and BoT-SORT. In particular, our evaluation on the KITTI MOT dataset demonstrates that EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively. At the same time, it elevates other performance metrics such as HOTA by more than 5%. Our source code is available at https://github.com/noyzzz/EMAP.

CVJan 22
Sub-Region-Aware Modality Fusion and Adaptive Prompting for Multi-Modal Brain Tumor Segmentation

Shadi Alijani, Fereshteh Aghaee Meibodi, Homayoun Najjaran

The successful adaptation of foundation models to multi-modal medical imaging is a critical yet unresolved challenge. Existing models often struggle to effectively fuse information from multiple sources and adapt to the heterogeneous nature of pathological tissues. To address this, we introduce a novel framework for adapting foundation models to multi-modal medical imaging, featuring two key technical innovations: sub-region-aware modality attention and adaptive prompt engineering. The attention mechanism enables the model to learn the optimal combination of modalities for each tumor sub-region, while the adaptive prompting strategy leverages the inherent capabilities of foundation models to refine segmentation accuracy. We validate our framework on the BraTS 2020 brain tumor segmentation dataset, demonstrating that our approach significantly outperforms baseline methods, particularly in the challenging necrotic core sub-region. Our work provides a principled and effective approach to multi-modal fusion and prompting, paving the way for more accurate and robust foundation model-based solutions in medical imaging.

CVApr 3, 2025Code
Sliced Wasserstein Discrepancy in Disentangling Representation and Adaptation Networks for Unsupervised Domain Adaptation

Joel Sol, Shadi Alijani, Homayoun Najjaran

This paper introduces DRANet-SWD as a novel complete pipeline for disentangling content and style representations of images for unsupervised domain adaptation (UDA). The approach builds upon DRANet by incorporating the sliced Wasserstein discrepancy (SWD) as a style loss instead of the traditional Gram matrix loss. The potential advantages of SWD over the Gram matrix loss for capturing style variations in domain adaptation are investigated. Experiments using digit classification datasets and driving scenario segmentation validate the method, demonstrating that DRANet-SWD enhances performance. Results indicate that SWD provides a more robust statistical comparison of feature distributions, leading to better style adaptation. These findings highlight the effectiveness of SWD in refining feature alignment and improving domain adaptation tasks across these benchmarks. Our code can be found here.

LGNov 4, 2024Code
Conformal-in-the-Loop for Learning with Imbalanced Noisy Data

John Brandon Graham-Knight, Jamil Fayyad, Nourhan Bayasi et al.

Class imbalance and label noise are pervasive in large-scale datasets, yet much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions. Existing approaches typically address either label noise or class imbalance in isolation, leading to suboptimal results when both issues coexist. In this work, we propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach. CitL evaluates sample uncertainty to adjust weights and prune unreliable examples, enhancing model resilience and accuracy with minimal computational cost. Our extensive experiments include a detailed analysis showing how CitL effectively emphasizes impactful data in noisy, imbalanced datasets. Our results show that CitL consistently boosts model performance, achieving up to a 6.1% increase in classification accuracy and a 5.0 mIoU improvement in segmentation. Our code is publicly available: CitL.

CVApr 2, 2024Code
Visual Deformation Detection Using Soft Material Simulation for Pre-training of Condition Assessment Models

Joel Sol, Amir M. Soufi Enayati, Homayoun Najjaran

This paper addresses the challenge of geometric quality assurance in manufacturing, particularly when human assessment is required. It proposes using Blender, an open-source simulation tool, to create synthetic datasets for machine learning (ML) models. The process involves translating expert information into shape key parameters to simulate deformations, generating images for both deformed and non-deformed objects. The study explores the impact of discrepancies between real and simulated environments on ML model performance and investigates the effect of different simulation backgrounds on model sensitivity. Additionally, the study aims to enhance the model's robustness to camera positioning by generating datasets with a variety of randomized viewpoints. The entire process, from data synthesis to model training and testing, is implemented using a Python API interfacing with Blender. An experiment with a soda can object validates the accuracy of the proposed pipeline.

CVApr 5, 2024
Vision transformers in domain adaptation and domain generalization: a study of robustness

Shadi Alijani, Jamil Fayyad, Homayoun Najjaran

Deep learning models are often evaluated in scenarios where the data distribution is different from those used in the training and validation phases. The discrepancy presents a challenge for accurately predicting the performance of models once deployed on the target distribution. Domain adaptation and generalization are widely recognized as effective strategies for addressing such shifts, thereby ensuring reliable performance. The recent promising results in applying vision transformers in computer vision tasks, coupled with advancements in self-attention mechanisms, have demonstrated their significant potential for robustness and generalization in handling distribution shifts. Motivated by the increased interest from the research community, our paper investigates the deployment of vision transformers in domain adaptation and domain generalization scenarios. For domain adaptation methods, we categorize research into feature-level, instance-level, model-level adaptations, and hybrid approaches, along with other categorizations with respect to diverse strategies for enhancing domain adaptation. Similarly, for domain generalization, we categorize research into multi-domain learning, meta-learning, regularization techniques, and data augmentation strategies. We further classify diverse strategies in research, underscoring the various approaches researchers have taken to address distribution shifts by integrating vision transformers. The inclusion of comprehensive tables summarizing these categories is a distinct feature of our work, offering valuable insights for researchers. These findings highlight the versatility of vision transformers in managing distribution shifts, crucial for real-world applications, especially in critical safety and decision-making scenarios.

ROMar 21, 2024
Extended Reality for Enhanced Human-Robot Collaboration: a Human-in-the-Loop Approach

Yehor Karpichev, Todd Charter, Jayden Hong et al.

The rise of automation has provided an opportunity to achieve higher efficiency in manufacturing processes, yet it often compromises the flexibility required to promptly respond to evolving market needs and meet the demand for customization. Human-robot collaboration attempts to tackle these challenges by combining the strength and precision of machines with human ingenuity and perceptual understanding. In this paper, we conceptualize and propose an implementation framework for an autonomous, machine learning-based manipulator that incorporates human-in-the-loop principles and leverages Extended Reality (XR) to facilitate intuitive communication and programming between humans and robots. Furthermore, the conceptual framework foresees human involvement directly in the robot learning process, resulting in higher adaptability and task generalization. The paper highlights key technologies enabling the proposed framework, emphasizing the importance of developing the digital ecosystem as a whole. Additionally, we review the existent implementation approaches of XR in human-robot collaboration, showcasing diverse perspectives and methodologies. The challenges and future outlooks are discussed, delving into the major obstacles and potential research avenues of XR for more natural human-robot interaction and integration in the industrial landscape.

SYFeb 23, 2024
Safety Optimized Reinforcement Learning via Multi-Objective Policy Optimization

Homayoun Honari, Mehran Ghafarian Tamizi, Homayoun Najjaran

Safe reinforcement learning (Safe RL) refers to a class of techniques that aim to prevent RL algorithms from violating constraints in the process of decision-making and exploration during trial and error. In this paper, a novel model-free Safe RL algorithm, formulated based on the multi-objective policy optimization framework is introduced where the policy is optimized towards optimality and safety, simultaneously. The optimality is achieved by the environment reward function that is subsequently shaped using a safety critic. The advantage of the Safety Optimized RL (SORL) algorithm compared to the traditional Safe RL algorithms is that it omits the need to constrain the policy search space. This allows SORL to find a natural tradeoff between safety and optimality without compromising the performance in terms of either safety or optimality due to strict search space constraints. Through our theoretical analysis of SORL, we propose a condition for SORL's converged policy to guarantee safety and then use it to introduce an aggressiveness parameter that allows for fine-tuning the mentioned tradeoff. The experimental results obtained in seven different robotic environments indicate a considerable reduction in the number of safety violations along with higher, or competitive, policy returns, in comparison to six different state-of-the-art Safe RL methods. The results demonstrate the significant superiority of the proposed SORL algorithm in safety-critical applications.

SPApr 8, 2024
Condition Monitoring with Incomplete Data: An Integrated Variational Autoencoder and Distance Metric Framework

Maryam Ahang, Mostafa Abbasi, Todd Charter et al.

Condition monitoring of industrial systems is crucial for ensuring safety and maintenance planning, yet notable challenges arise in real-world settings due to the limited or non-existent availability of fault samples. This paper introduces an innovative solution to this problem by proposing a new method for fault detection and condition monitoring for unseen data. Adopting an approach inspired by zero-shot learning, our method can identify faults and assign a relative health index to various operational conditions. Typically, we have plenty of data on normal operations, some data on compromised conditions, and very few (if any) samples of severe faults. We use a variational autoencoder to capture the probabilistic distribution of previously seen and new unseen conditions. The health status is determined by comparing each sample's deviation from a normal operation reference distribution in the latent space. Faults are detected by establishing a threshold for the health indexes, allowing the model to identify severe, unseen faults with high accuracy, even amidst noise. We validate our approach using the run-to-failure IMS-bearing dataset and compare it with other methods. The health indexes generated by our model closely match the established descriptive model of bearing wear, attesting to the robustness and reliability of our method. These findings highlight the potential of our methodology in augmenting fault detection capabilities within industrial domains, thereby contributing to heightened safety protocols and optimized maintenance practices.

LGJan 3, 2024
Intelligent Condition Monitoring of Industrial Plants: An Overview of Methodologies and Uncertainty Management Strategies

Maryam Ahang, Todd Charter, Mostafa Abbasi et al.

Condition monitoring is essential for ensuring the safety, reliability, and efficiency of modern industrial systems. With the increasing complexity of industrial processes, artificial intelligence (AI) has emerged as a powerful tool for fault detection and diagnosis, attracting growing interest from both academia and industry. This paper provides a comprehensive overview of intelligent condition monitoring methods, with a particular emphasis on chemical plants and the widely used Tennessee Eastman Process (TEP) benchmark. State-of-the-art machine learning (ML) and deep learning (DL) algorithms are reviewed, highlighting their strengths, limitations, and applicability to industrial fault detection and diagnosis. Special attention is given to key challenges, including imbalanced and unlabeled data, and to strategies by which models can address these issues. Furthermore, comparative analyses of algorithm performance are presented to guide method selection in practical scenarios. This survey is intended to benefit both newcomers and experienced researchers by consolidating fundamental concepts, summarizing recent advances, and outlining open challenges and promising directions for intelligent condition monitoring in industrial plants.

CVMay 21, 2025
Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

Mehran Zoravar, Shadi Alijani, Homayoun Najjaran

Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.

AIFeb 24, 2024
A mathematical model for simultaneous personnel shift planning and unrelated parallel machine scheduling

Maziyar Khadivi, Mostafa Abbasi, Todd Charter et al.

This paper addresses a production scheduling problem derived from an industrial use case, focusing on unrelated parallel machine scheduling with the personnel availability constraint. The proposed model optimizes the production plan over a multi-period scheduling horizon, accommodating variations in personnel shift hours within each time period. It assumes shared personnel among machines, with one personnel required per machine for setup and supervision during job processing. Available personnel are fewer than the machines, thus limiting the number of machines that can operate in parallel. The model aims to minimize the total production time considering machine-dependent processing times and sequence-dependent setup times. The model handles practical scenarios like machine eligibility constraints and production time windows. A Mixed Integer Linear Programming (MILP) model is introduced to formulate the problem, taking into account both continuous and district variables. A two-step solution approach enhances computational speed, first maximizing accepted jobs and then minimizing production time. Validation with synthetic problem instances and a real industrial case study of a food processing plant demonstrates the performance of the model and its usefulness in personnel shift planning. The findings offer valuable insights for practical managerial decision-making in the context of production scheduling.

LGFeb 14, 2025
An Innovative Next Activity Prediction Approach Using Process Entropy and DAW-Transformer

Hadi Zare, Mostafa Abbasi, Maryam Ahang et al.

Purpose - In Business Process Management (BPM), accurate prediction of the next activities is vital for operational efficiency and decision-making. Current Artificial Intelligence (AI)/Machine Learning (ML) models struggle with the complexity and evolving nature of business process event logs, balancing accuracy and interpretability. This paper proposes an entropy-driven model selection approach and DAW-Transformer, which stands for Dynamic Attribute-Aware Transformer, to integrate all attributes with a dynamic window for better accuracy. Design/methodology/approach - This paper introduces a novel next-activity prediction approach that uses process entropy to assess the complexity of event logs and dynamically select the most suitable ML model. A new transformer-based architecture with multi-head attention and dynamic windowing mechanism, DAW-Transformer, is proposed to capture long-range dependencies and utilize all relevant event log attributes. Experiments were conducted on six public datasets, and the performance was evaluated with process entropy. Finding - The results demonstrate the effectiveness of the approach across these publicly available datasets. DAW-Transformer achieved superior performance, especially on high-entropy datasets such as Sepsis exceeding Limited window Multi-Transformers by 4.69% and a benchmark CNN-LSTM-SAtt model by 3.07%. For low-entropy datasets like Road Traffic Fine, simpler, more interpretable algorithms like Random Forest performed nearly as well as the more complex DAW-Transformer and offered better handling of imbalanced data and improved explainability. Originality/ value - This work's novelty lies in the proposed DAW-Transformer, with a dynamic window and considering all relevant attributes. Also, entropy-driven selection methods offer a robust, accurate, and interpretable solution for next-activity prediction.

LGJan 17, 2025
An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring

Mostafa Abbasi, Maziyar Khadivi, Maryam Ahang et al.

The application of artificial intelligence and machine learning in business process management has advanced significantly, however, the full potential of these technologies remains largely unexplored, primarily due to challenges related to data quality and availability. We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization (FORLAPS), which aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism, thereby enabling context-sensitive prescriptions. Additionally, to compare FORLAPS with the existing models (Permutation Feature Importance and multi-task Long Short Term Memory model), we experimented to evaluate its effectiveness in terms of resource savings and process time reduction. The experimental results on real-life event logs validate that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span. To further enhance learning, we introduce an innovative process-aware data augmentation technique that selectively increases the average estimated Q-values in sampled batches, enabling automatic fine-tuning of the reinforcement learning model. Robustness was assessed through both prefix-level and trace-level evaluations, using the Damerau-Levenshtein distance as the primary metric. Finally, the model's adaptability across industries was further validated through diverse case studies, including healthcare treatment pathways, financial services workflows, permit applications from regulatory bodies, and operations management. In each domain, the proposed model demonstrated exceptional performance, outperforming existing state-of-the-art approaches in prescriptive decision-making, demonstrating its capability to prescribe optimal next steps and predict the best next activities within a process trace.

ROOct 21, 2025
A Cross-Environment and Cross-Embodiment Path Planning Framework via a Conditional Diffusion Model

Mehran Ghafarian Tamizi, Homayoun Honari, Amir Mehdi Soufi Enayati et al.

Path planning for a robotic system in high-dimensional cluttered environments needs to be efficient, safe, and adaptable for different environments and hardware. Conventional methods face high computation time and require extensive parameter tuning, while prior learning-based methods still fail to generalize effectively. The primary goal of this research is to develop a path planning framework capable of generalizing to unseen environments and new robotic manipulators without the need for retraining. We present GADGET (Generalizable and Adaptive Diffusion-Guided Environment-aware Trajectory generation), a diffusion-based planning model that generates joint-space trajectories conditioned on voxelized scene representations as well as start and goal configurations. A key innovation is GADGET's hybrid dual-conditioning mechanism that combines classifier-free guidance via learned scene encoding with classifier-guided Control Barrier Function (CBF) safety shaping, integrating environment awareness with real-time collision avoidance directly in the denoising process. This design supports zero-shot transfer to new environments and robotic embodiments without retraining. Experimental results show that GADGET achieves high success rates with low collision intensity in spherical-obstacle, bin-picking, and shelf environments, with CBF guidance further improving safety. Moreover, comparative evaluations indicate strong performance relative to both sampling-based and learning-based baselines. Furthermore, GADGET provides transferability across Franka Panda, Kinova Gen3 (6/7-DoF), and UR5 robots, and physical execution on a Kinova Gen3 demonstrates its ability to generate safe, collision-free trajectories in real-world settings.

GRAug 7, 2025
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding

Mahmoud Chick Zaouali, Todd Charter, Yehor Karpichev et al.

Gaussian Splatting has rapidly emerged as a transformative technique for real-time 3D scene representation, offering a highly efficient and expressive alternative to Neural Radiance Fields (NeRF). Its ability to render complex scenes with high fidelity has enabled progress across domains such as scene reconstruction, robotics, and interactive content creation. More recently, the integration of Large Language Models (LLMs) and language embeddings into Gaussian Splatting pipelines has opened new possibilities for text-conditioned generation, editing, and semantic scene understanding. Despite these advances, a comprehensive overview of this emerging intersection has been lacking. This survey presents a structured review of current research efforts that combine language guidance with 3D Gaussian Splatting, detailing theoretical foundations, integration strategies, and real-world use cases. We highlight key limitations such as computational bottlenecks, generalizability, and the scarcity of semantically annotated 3D Gaussian data and outline open challenges and future directions for advancing language-guided 3D scene understanding using Gaussian Splatting.

CVJul 31, 2025
A Deep Dive into Generic Object Tracking: A Survey

Fereshteh Aghaee Meibodi, Shadi Alijani, Homayoun Najjaran

Generic object tracking remains an important yet challenging task in computer vision due to complex spatio-temporal dynamics, especially in the presence of occlusions, similar distractors, and appearance variations. Over the past two decades, a wide range of tracking paradigms, including Siamese-based trackers, discriminative trackers, and, more recently, prominent transformer-based approaches, have been introduced to address these challenges. While a few existing survey papers in this field have either concentrated on a single category or widely covered multiple ones to capture progress, our paper presents a comprehensive review of all three categories, with particular emphasis on the rapidly evolving transformer-based methods. We analyze the core design principles, innovations, and limitations of each approach through both qualitative and quantitative comparisons. Our study introduces a novel categorization and offers a unified visual and tabular comparison of representative methods. Additionally, we organize existing trackers from multiple perspectives and summarize the major evaluation benchmarks, highlighting the fast-paced advancements in transformer-based tracking driven by their robust spatio-temporal modeling capabilities.

IVJul 30, 2025
LesionGen: A Concept-Guided Diffusion Model for Dermatology Image Synthesis

Jamil Fayyad, Nourhan Bayasi, Ziyang Yu et al.

Deep learning models for skin disease classification require large, diverse, and well-annotated datasets. However, such resources are often limited due to privacy concerns, high annotation costs, and insufficient demographic representation. While text-to-image diffusion probabilistic models (T2I-DPMs) offer promise for medical data synthesis, their use in dermatology remains underexplored, largely due to the scarcity of rich textual descriptions in existing skin image datasets. In this work, we introduce LesionGen, a clinically informed T2I-DPM framework for dermatology image synthesis. Unlike prior methods that rely on simplistic disease labels, LesionGen is trained on structured, concept-rich dermatological captions derived from expert annotations and pseudo-generated, concept-guided reports. By fine-tuning a pretrained diffusion model on these high-quality image-caption pairs, we enable the generation of realistic and diverse skin lesion images conditioned on meaningful dermatological descriptions. Our results demonstrate that models trained solely on our synthetic dataset achieve classification accuracy comparable to those trained on real images, with notable gains in worst-case subgroup performance. Code and data are available here.

LGMay 26, 2025
WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts

Shadi Alijani, Homayoun Najjaran

Conformal prediction (CP) provides a framework for constructing prediction sets with guaranteed coverage, assuming exchangeable data. However, real-world scenarios often involve distribution shifts that violate exchangeability, leading to unreliable coverage and inflated prediction sets. To address this challenge, we first introduce Reconstruction Loss-Scaled Conformal Prediction (RLSCP), which utilizes reconstruction losses derived from a Variational Autoencoder (VAE) as an uncertainty metric to scale score functions. While RLSCP demonstrates performance improvements, mainly resulting in better coverage, it quantifies quantiles based on a fixed calibration dataset without considering the discrepancies between test and train datasets in an unexchangeable setting. In the next step, we propose Weighted Quantile Loss-scaled Conformal Prediction (WQLCP), which refines RLSCP by incorporating a weighted notion of exchangeability, adjusting the calibration quantile threshold based on weights with respect to the ratio of calibration and test loss values. This approach improves the CP-generated prediction set outputs in the presence of distribution shifts. Experiments on large-scale datasets, including ImageNet variants, demonstrate that WQLCP outperforms existing baselines by consistently maintaining coverage while reducing prediction set sizes, providing a robust solution for CP under distribution shifts.

GRMay 23, 2025
From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation

Mahmoud Chick Zaouali, Todd Charter, Homayoun Najjaran

High-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding. In this work, we present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation. We leverage LSeg-based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest-scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings. Our results highlight the strengths and limitations of various feature field backbones (CLIP-LSeg, SAM, SAM2) in capturing meaningful structure in large-scale outdoor environments. We demonstrate that this hybrid approach enables flexible, language-driven interaction with photorealistic 3D reconstructions, opening new possibilities for semantic aerial inspection and scene understanding.

LGMay 20, 2025
Multi-Channel Swin Transformer Framework for Bearing Remaining Useful Life Prediction

Ali Mohajerzarrinkelk, Maryam Ahang, Mehran Zoravar et al.

Precise estimation of the Remaining Useful Life (RUL) of rolling bearings is an important consideration to avoid unexpected failures, reduce downtime, and promote safety and efficiency in industrial systems. Complications in degradation trends, noise presence, and the necessity to detect faults in advance make estimation of RUL a challenging task. This paper introduces a novel framework that combines wavelet-based denoising method, Wavelet Packet Decomposition (WPD), and a customized multi-channel Swin Transformer model (MCSFormer) to address these problems. With attention mechanisms incorporated for feature fusion, the model is designed to learn global and local degradation patterns utilizing hierarchical representations for enhancing predictive performance. Additionally, a customized loss function is developed as a key distinction of this work to differentiate between early and late predictions, prioritizing accurate early detection and minimizing the high operation risks of late predictions. The proposed model was evaluated with the PRONOSTIA dataset using three experiments. Intra-condition experiments demonstrated that MCSFormer outperformed state-of-the-art models, including the Adaptive Transformer, MDAN, and CNN-SRU, achieving 41%, 64%, and 69% lower MAE on average across different operating conditions, respectively. In terms of cross-condition testing, it achieved superior generalization under varying operating conditions compared to the adapted ViT and Swin Transformer. Lastly, the custom loss function effectively reduced late predictions, as evidenced in a 6.3% improvement in the scoring metric while maintaining competitive overall performance. The model's robust noise resistance, generalization capability, and focus on safety make MCSFormer a trustworthy and effective predictive maintenance tool in industrial applications.

LGMay 20, 2025
Feature-Weighted MMD-CORAL for Domain Adaptation in Power Transformer Fault Diagnosis

Hootan Mahmoodiyan, Maryam Ahang, Mostafa Abbasi et al.

Ensuring the reliable operation of power transformers is critical to grid stability. Dissolved Gas Analysis (DGA) is widely used for fault diagnosis, but traditional methods rely on heuristic rules, which may lead to inconsistent results. Machine learning (ML)-based approaches have improved diagnostic accuracy; however, power transformers operate under varying conditions, and differences in transformer type, environmental factors, and operational settings create distribution shifts in diagnostic data. Consequently, direct model transfer between transformers often fails, making techniques for domain adaptation a necessity. To tackle this issue, this work proposes a feature-weighted domain adaptation technique that combines Maximum Mean Discrepancy (MMD) and Correlation Alignment (CORAL) with feature-specific weighting (MCW). Kolmogorov-Smirnov (K-S) statistics are used to assign adaptable weights, prioritizing features with larger distributional discrepancies and thereby improving source and target domain alignment. Experimental evaluations on datasets for power transformers demonstrate the effectiveness of the proposed method, which achieves a 7.9% improvement over Fine-Tuning and a 2.2% improvement over MMD-CORAL (MC). Furthermore, it outperforms both techniques across various training sample sizes, confirming its robustness for domain adaptation.

CVSep 1, 2023
Gap and Overlap Detection in Automated Fiber Placement

Assef Ghamisi, Homayoun Najjaran

The identification and correction of manufacturing defects, particularly gaps and overlaps, are crucial for ensuring high-quality composite parts produced through Automated Fiber Placement (AFP). These imperfections are the most commonly observed issues that can significantly impact the overall quality of the composite parts. Manual inspection is both time-consuming and labor-intensive, making it an inefficient approach. To overcome this challenge, the implementation of an automated defect detection system serves as the optimal solution. In this paper, we introduce a novel method that uses an Optical Coherence Tomography (OCT) sensor and computer vision techniques to detect and locate gaps and overlaps in composite parts. Our approach involves generating a depth map image of the composite surface that highlights the elevation of composite tapes (or tows) on the surface. By detecting the boundaries of each tow, our algorithm can compare consecutive tows and identify gaps or overlaps that may exist between them. Any gaps or overlaps exceeding a predefined tolerance threshold are considered manufacturing defects. To evaluate the performance of our approach, we compare the detected defects with the ground truth annotated by experts. The results demonstrate a high level of accuracy and efficiency in gap and overlap segmentation.