Aijun An

CV
h-index20
20papers
700citations
Novelty41%
AI Score54

20 Papers

LGApr 4, 2022
A Survey on Graph Representation Learning Methods

Shima Khoshraftar, Aijun An

Graphs representation learning has been a very active research area in recent years. The goal of graph representation learning is to generate graph representation vectors that capture the structure and features of large graphs accurately. This is especially important because the quality of the graph representation vectors will affect the performance of these vectors in downstream tasks such as node classification, link prediction and anomaly detection. Many techniques are proposed for generating effective graph representation vectors. Two of the most prevalent categories of graph representation learning are graph embedding methods without using graph neural nets (GNN), which we denote as non-GNN based graph embedding methods, and graph neural nets (GNN) based methods. Non-GNN graph embedding methods are based on techniques such as random walks, temporal point processes and neural network learning methods. GNN-based methods, on the other hand, are the application of deep learning on graph data. In this survey, we provide an overview of these two categories and cover the current state-of-the-art methods for both static and dynamic graphs. Finally, we explore some open and ongoing research directions for future work.

CVNov 21, 2024Code
Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models

Hamidreza Dastmalchi, Aijun An, Ali Cheraghian et al.

Test-time adaptation (TTA) of 3D point clouds is crucial for mitigating discrepancies between training and testing samples in real-world scenarios, particularly when handling corrupted point clouds. LiDAR data, for instance, can be affected by sensor failures or environmental factors, causing domain gaps. Adapting models to these distribution shifts online is crucial, as training for every possible variation is impractical. Existing methods often focus on fine-tuning pre-trained models based on self-supervised learning or pseudo-labeling, which can lead to forgetting valuable source domain knowledge over time and reduce generalization on future tests. In this paper, we introduce a novel 3D test-time adaptation method, termed 3DD-TTA, which stands for 3D Denoising Diffusion Test-Time Adaptation. This method uses a diffusion strategy that adapts input point cloud samples to the source domain while keeping the source model parameters intact. The approach uses a Variational Autoencoder (VAE) to encode the corrupted point cloud into a shape latent and latent points. These latent points are corrupted with Gaussian noise and subjected to a denoising diffusion process. During this process, both the shape latent and latent points are updated to preserve fidelity, guiding the denoising toward generating consistent samples that align more closely with the source domain. We conduct extensive experiments on the ShapeNet dataset and investigate its generalizability on ModelNet40 and ScanObjectNN, achieving state-of-the-art results. The code has been released at \url{https://github.com/hamidreza-dastmalchi/3DD-TTA}.

CVMay 14
SteerSeg: Attention Steering for Reasoning Video Segmentation

Ali Cheraghian, Hamidreza Dastmalchi, Abdelwahed Khamis et al.

Video reasoning segmentation requires localizing objects across video frames from natural language expressions, often involving spatial reasoning and implicit references. Recent approaches leverage frozen large vision-language models (LVLMs) by extracting attention maps and using them as spatial priors for segmentation, enabling training-free grounding. However, these attention maps are optimized for text generation rather than spatial localization, often resulting in diffuse and ambiguous grounding signals. In this work, we introduce SteerSeg, a lightweight framework that identifies attention misalignment as the key bottleneck in attention-based grounding and proposes to steer attention at its source through input-level conditioning. SteerSeg combines learnable soft prompts with reasoning-guided Chain-of-Thought (CoT) prompting. The soft prompts reshape the attention distribution to produce more spatially concentrated maps, while CoT-derived attributes resolve ambiguity among similar objects by guiding attention toward the correct instance. The resulting attention maps are converted into point prompts across keyframes to guide a segmentation model, while candidate tracklets are ranked and selected using correlation-based scoring. Our approach freezes the LVLM and segmentation model parameters and learns only a small set of soft prompts, preserving the model's pretrained reasoning capabilities while significantly improving grounding. Despite being trained only on Ref-YouTube-VOS, SteerSeg generalizes well across diverse benchmarks, significantly improving the spatial grounding capability of LVLMs. Project page: https://steerseg.github.io

CVMar 11
Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression

Hamidreza Dastmalchi, Aijun An, Ali Cheraghian et al.

While large vision-language models (LVLMs) achieve strong performance on multimodal tasks, they frequently generate hallucinations -- unfaithful outputs misaligned with the visual input. To address this issue, we introduce CIPHER (Counterfactual Image Perturbations for Hallucination Extraction and Removal), a training-free method that suppresses vision-induced hallucinations via lightweight feature-level correction. Unlike prior training-free approaches that primarily focus on text-induced hallucinations, CIPHER explicitly targets hallucinations arising from the visual modality. CIPHER operates in two phases. In the offline phase, we construct OHC-25K (Object-Hallucinated Counterfactuals, 25,000 samples), a counterfactual dataset consisting of diffusion-edited images that intentionally contradict the original ground-truth captions. We pair these edited images with the unchanged ground-truth captions and process them through an LVLM to extract hallucination-related representations. Contrasting these representations with those from authentic (image, caption) pairs reveals structured, systematic shifts spanning a low-rank subspace characterizing vision-induced hallucination. In the inference phase, CIPHER suppresses hallucinations by projecting intermediate hidden states away from this subspace. Experiments across multiple benchmarks show that CIPHER significantly reduces hallucination rates while preserving task performance, demonstrating the effectiveness of counterfactual visual perturbations for improving LVLM faithfulness. Code and additional materials are available at https://hamidreza-dastmalchi.github.io/cipher-cvpr2026/.

CVAug 7, 2025Code
ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

Hamidreza Dastmalchi, Aijun An, Ali cheraghian

Pretrained vision-language models (VLMs) like CLIP show strong zero-shot performance but struggle with generalization under distribution shifts. Test-Time Adaptation (TTA) addresses this by adapting VLMs to unlabeled test data in new domains. While some TTA methods rely on prompt-tuning, training-free cache-based approaches are preferred for efficiency. However, current cache-based TTA models store only a limited set of high-confidence samples, restricting the decision boundary to these samples and ignoring the influence of other incoming test data. To address this, we propose Efficient Test-Time Adaptation (ETTA), introducing a Recursive Updating module that integrates all incoming test samples, progressively refining the decision boundary. This strategy mimics an unbounded cache, dynamically updating contextual embeddings for improved accuracy with minimal memory and computational overhead. ETTA also includes an Adaptive Ensemble module to reduce prompt dependency in image-to-text scores by dynamically selecting optimal prompts for each class. Furthermore, ETTA adaptively combines scores from both modules based on confidence levels, leveraging their complementary strengths. Extensive experiments on two benchmarks confirm that ETTA surpasses the state-of-the-art TTA models in computational complexity and accuracy, setting a new standard for effective, efficient test-time adaptation. The code has been released at https://github.com/hamidreza-dastmalchi/ETTA.

PEOct 21, 2024
Disease Outbreak Detection and Forecasting: A Review of Methods and Data Sources

Ghazaleh Babanejaddehaki, Aijun An, Manos Papagelis

Infectious diseases occur when pathogens from other individuals or animals infect a person, resulting in harm to both individuals and society as a whole. The outbreak of such diseases can pose a significant threat to human health. However, early detection and tracking of these outbreaks have the potential to reduce the mortality impact. To address these threats, public health authorities have endeavored to establish comprehensive mechanisms for collecting disease data. Many countries have implemented infectious disease surveillance systems, with the detection of epidemics being a primary objective. The clinical healthcare system, local/state health agencies, federal agencies, academic/professional groups, and collaborating governmental entities all play pivotal roles within this system. Moreover, nowadays, search engines and social media platforms can serve as valuable tools for monitoring disease trends. The Internet and social media have become significant platforms where users share information about their preferences and relationships. This real-time information can be harnessed to gauge the influence of ideas and societal opinions, making it highly useful across various domains and research areas, such as marketing campaigns, financial predictions, and public health, among others. This article provides a review of the existing standard methods developed by researchers for detecting outbreaks using time series data. These methods leverage various data sources, including conventional data sources and social media data or Internet data sources. The review particularly concentrates on works published within the timeframe of 2015 to 2022.

CVNov 19, 2025
Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

Mehran Tamjidi, Hamidreza Dastmalchi, Mohammadreza Alimoradijazi et al.

3D Vision-Language Foundation Models (VLFMs) have shown strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, these models often underperform in practical scenarios where data are noisy, incomplete, or drawn from a different distribution than the training data. To address this, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. We define a 3D cache to store class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability in heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation via similarity scoring. Simultaneously, a graph-based label smoothing module captures inter-prototype similarities to enforce label consistency among similar prototypes. Finally, we unify predictions from the original 3D VLFM and the refined 3D cache using entropy-weighted aggregation for reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts, achieving state-of-the-art performance on diverse 3D benchmarks over different 3D VLFMs, improving ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.

CVJun 16, 2025
Evolution of ReID: From Early Methods to LLM Integration

Amran Bhuiyan, Mizanur Rahman, Md Tahmid Rahman Laskar et al.

Person re-identification (ReID) has evolved from handcrafted feature-based methods to deep learning approaches and, more recently, to models incorporating large language models (LLMs). Early methods struggled with variations in lighting, pose, and viewpoint, but deep learning addressed these issues by learning robust visual features. Building on this, LLMs now enable ReID systems to integrate semantic and contextual information through natural language. This survey traces that full evolution and offers one of the first comprehensive reviews of ReID approaches that leverage LLMs, where textual descriptions are used as privileged information to improve visual matching. A key contribution is the use of dynamic, identity-specific prompts generated by GPT-4o, which enhance the alignment between images and text in vision-language ReID systems. Experimental results show that these descriptions improve accuracy, especially in complex or ambiguous cases. To support further research, we release a large set of GPT-4o-generated descriptions for standard ReID datasets. By bridging computer vision and natural language processing, this survey offers a unified perspective on the field's development and outlines key future directions such as better prompt design, cross-modal transfer learning, and real-world adaptability.

CRFeb 16, 2022
Data Capsule: A Self-Contained Data Model as an Access Policy Enforcement Strategy

Reza Soltani, Uyen Trang Nguyen, Aijun An

In this paper, we introduce a data capsule model, a self-contained and self-enforcing data container based on emerging self-sovereign identity standards, blockchain, and attribute-based encryption. A data capsule allows for a transparent, privacy-respecting, and secure exchange of personal data, enabling a progressive trust scheme in a semi-trusted environment. Each data capsule is bundled with its own access policy structure and verifiable data, drastically reducing the number of interactions needed among the user, the service providers, and data custodians. Moreover, by relying on the decentralized nature of blockchain and attribute-based encryption our proposed model ensures the access policies published by service providers are public, transparent, and strictly followed.

CRNov 3, 2021
A Survey of Self-Sovereign Identity Ecosystem

Reza Soltani, Uyen Trang Nguyen, Aijun An

Self-sovereign identity is the next evolution of identity management models. This survey takes a journey through the origin of identity, defining digital identity and progressive iterations of digital identity models leading up to self-sovereign identity. It then states the relevant research initiatives, platforms, projects, and regulatory frameworks, as well as the building blocks including decentralized identifiers, verifiable credentials, distributed ledger, and various privacy engineering protocols. Finally, the survey provides an overview of the key challenges and research opportunities around self-sovereign identity.

CRApr 27, 2021
Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Md Tahmid Rahman Laskar, Jimmy Huang, Vladan Smetana et al.

Industrial Information Technology (IT) infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This paper aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model which was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use-cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.

LGJan 6, 2020
Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning

Xing Zhao, Manos Papagelis, Aijun An et al.

The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to relax its strict synchronization requirement. The proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. We also propose an efficient method that materializes the model, named ZIPLINE. The algorithm is tunable and can effectively balance the trade-off between quality of convergence and iteration throughput, in order to accommodate different environments or applications. A thorough experimental evaluation demonstrates that our proposed ELASTICBSP model converges faster and to a higher accuracy than the classic BSP. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models.

IRNov 26, 2019
Learning to Determine the Quality of News Headlines

Amin Omidvar, Hossein Poormodheji, Aijun An et al.

Today, most newsreaders read the online version of news articles rather than traditional paper-based newspapers. Also, news media publishers rely heavily on the income generated from subscriptions and website visits made by newsreaders. Thus, online user engagement is a very important issue for online newspapers. Much effort has been spent on writing interesting headlines to catch the attention of online users. On the other hand, headlines should not be misleading (e.g., clickbaits); otherwise, readers would be disappointed when reading the content. In this paper, we propose four indicators to determine the quality of published news headlines based on their click count and dwell time, which are obtained by website log analysis. Then, we use soft target distribution of the calculated quality indicators to train our proposed deep learning model which can predict the quality of unpublished news headlines. The proposed model not only processes the latent features of both headline and body of the article to predict its headline quality but also considers the semantic relation between headline and body as well. To evaluate our model, we use a real dataset from a major Canadian newspaper. Results show our proposed model outperforms other state-of-the-art NLP models.

LGNov 5, 2019
Dynamic Graph Embedding via LSTM History Tracking

Shima Khoshraftar, Sedigheh Mahdavi, Aijun An et al.

Many real world networks are very large and constantly change over time. These dynamic networks exist in various domains such as social networks, traffic networks and biological interactions. To handle large dynamic networks in downstream applications such as link prediction and anomaly detection, it is essential for such networks to be transferred into a low dimensional space. Recently, network embedding, a technique that converts a large graph into a low-dimensional representation, has become increasingly popular due to its strength in preserving the structure of a network. Efficient dynamic network embedding, however, has not yet been fully explored. In this paper, we present a dynamic network embedding method that integrates the history of nodes over time into the current state of nodes. The key contribution of our work is 1) generating dynamic network embedding by combining both dynamic and static node information 2) tracking history of neighbors of nodes using LSTM 3) significantly decreasing the time and memory by training an autoencoder LSTM model using temporal walks rather than adjacency matrices of graphs which are the common practice. We evaluate our method in multiple applications such as anomaly detection, link prediction and node classification in datasets from various domains.

LGOct 4, 2019
Dynamic Joint Variational Graph Autoencoders

Sedigheh Mahdavi, Shima Khoshraftar, Aijun An

Learning network representations is a fundamental task for many graph applications such as link prediction, node classification, graph clustering, and graph visualization. Many real-world networks are interpreted as dynamic networks and evolve over time. Most existing graph embedding algorithms were developed for static graphs mainly and cannot capture the evolution of a large dynamic network. In this paper, we propose Dynamic joint Variational Graph Autoencoders (Dyn-VGAE) that can learn both local structures and temporal evolutionary patterns in a dynamic network. Dyn-VGAE provides a joint learning framework for computing temporal representations of all graph snapshots simultaneously. Each auto-encoder embeds a graph snapshot based on its local structure and can also learn temporal dependencies by collaborating with other autoencoders. We conduct experimental studies on dynamic real-world graph datasets and the results demonstrate the effectiveness of the proposed method.

DCAug 16, 2019
Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Xing Zhao, Aijun An, Junfeng Liu et al.

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) which improves the state-of-the-art Stale Synchronous Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular staleness threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters, and consequently increases the frequency of parameters updates (increases iteration throughput), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous Parallel (BSP), Asynchronous Parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP.

LGDec 6, 2018
dynnode2vec: Scalable Dynamic Network Embedding

Sedigheh Mahdavi, Shima Khoshraftar, Aijun An

Network representation learning in low dimensional vector space has attracted considerable attention in both academic and industrial domains. Most real-world networks are dynamic with addition/deletion of nodes and edges. The existing graph embedding methods are designed for static networks and they cannot capture evolving patterns in a large dynamic network. In this paper, we propose a dynamic embedding method, dynnode2vec, based on the well-known graph embedding method node2vec. Node2vec is a random walk based embedding method for static networks. Applying static network embedding in dynamic settings has two crucial problems: 1) Generating random walks for every time step is time consuming 2) Embedding vector spaces in each timestamp are different. In order to tackle these challenges, dynnode2vec uses evolving random walks and initializes the current graph embedding with previous embedding vectors. We demonstrate the advantages of the proposed dynamic network embedding by conducting empirical evaluations on several large dynamic network datasets.

CLJun 20, 2018
Using Neural Network for Identifying Clickbaits in Online News Media

Amin Omidvar, Hui Jiang, Aijun An

Online news media sometimes use misleading headlines to lure users to open the news article. These catchy headlines that attract users but disappointed them at the end, are called Clickbaits. Because of the importance of automatic clickbait detection in online medias, lots of machine learning methods were proposed and employed to find the clickbait headlines. In this research, a model using deep learning methods is proposed to find the clickbaits in Clickbait Challenge 2017's dataset. The proposed model gained the first rank in the Clickbait Challenge 2017 in terms of Mean Squared Error. Also, data analytics and visualization techniques are employed to explore and discover the provided dataset to get more insight from the data.

COOct 17, 2017
Spontaneous Symmetry Breaking in Neural Networks

Ricky Fok, Aijun An, Xiaogang Wang

We propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory. Correlations between the weights within the same layer can be described by symmetries in that layer, and networks generalize better if such symmetries are broken to reduce the redundancies of the weights. Using a two parameter field theory, we find that the network can break such symmetries itself towards the end of training in a process commonly known in physics as spontaneous symmetry breaking. This corresponds to a network generalizing itself without any user input layers to break the symmetry, but by communication with adjacent layers. In the layer decoupling limit applicable to residual networks (He et al., 2015), we show that the remnant symmetries that survive the non-linear layers are spontaneously broken. The Lagrangian for the non-linear and weight layers together has striking similarities with the one in quantum field theory of a scalar. Using results from quantum field theory we show that our framework is able to explain many experimentally observed phenomena,such as training on random labels with zero error (Zhang et al., 2017), the information bottleneck, the phase transition out of it and gradient variance explosion (Shwartz-Ziv & Tishby, 2017), shattered gradients (Balduzzi et al., 2017), and many more.

COSep 9, 2017
Optimization assisted MCMC

Ricky Fok, Aijun An, Xiaogang Wang

Markov Chain Monte Carlo (MCMC) sampling methods are widely used but often encounter either slow convergence or biased sampling when applied to multimodal high dimensional distributions. In this paper, we present a general framework of improving classical MCMC samplers by employing a global optimization method. The global optimization method first reduces a high dimensional search to an one dimensional geodesic to find a starting point close to a local mode. The search is accelerated and completed by using a local search method such as BFGS. We modify the target distribution by extracting a local Gaussian distribution aound the found mode. The process is repeated to find all the modes during sampling on the fly. We integrate the optimization algorithm into the Wormhole Hamiltonian Monte Carlo (WHMC) method. Experimental results show that, when applied to high dimensional, multimodal Gaussian mixture models and the network sensor localization problem, the proposed method achieves much faster convergence, with relative error from the mean improved by about an order of magnitude than WHMC in some cases.