Hoang-Quan Nguyen

CV
h-index21
12papers
115citations
Novelty57%
AI Score54

12 Papers

CVNov 26, 2023
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan Bac Nguyen et al.

In precision agriculture, the detection and recognition of insects play an essential role in the ability of crops to grow healthy and produce a high-quality yield. The current machine vision model requires a large volume of data to achieve high performance. However, there are approximately 5.5 million different insect species in the world. None of the existing insect datasets can cover even a fraction of them due to varying geographic locations and acquisition costs. In this paper, we introduce a novel "Insect-1M" dataset, a game-changing resource poised to revolutionize insect-related foundation model training. Covering a vast spectrum of insect species, our dataset, including 1 million images with dense identification labels of taxonomy hierarchy and insect descriptions, offers a panoramic view of entomology, enabling foundation models to comprehend visual and semantic information about insects like never before. Then, to efficiently establish an Insect Foundation Model, we develop a micro-feature self-supervised learning method with a Patch-wise Relevant Attention mechanism capable of discerning the subtle differences among insect images. In addition, we introduce Description Consistency loss to improve micro-feature modeling via insect descriptions. Through our experiments, we illustrate the effectiveness of our proposed approach in insect modeling and achieve State-of-the-Art performance on standard benchmarks of insect-related tasks. Our Insect Foundation Model and Dataset promise to empower the next generation of insect-related vision models, bringing them closer to the ultimate goal of precision agriculture.

QUANT-PHAug 7, 2024
Hierarchical Quantum Control Gates for Functional MRI Understanding

Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill et al.

Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods.

30.0CVMar 17
OpenQlaw: An Agentic AI Assistant for Analysis of 2D Quantum Materials

Sankalp Pandey, Xuan-Bac Nguyen, Hoang-Quan Nguyen et al.

The transition from optical identification of 2D quantum materials to practical device fabrication requires dynamic reasoning beyond the detection accuracy. While recent domain-specific Multimodal Large Language Models (MLLMs) successfully ground visual features using physics-informed reasoning, their outputs are optimized for step-by-step cognitive transparency. This yields verbose candidate enumerations followed by dense reasoning that, while accurate, may induce cognitive overload and lack immediate utility for real-world interaction with researchers. To address this challenge, we introduce OpenQlaw, an agentic orchestration system for analyzing 2D materials. The architecture is built upon NanoBot, a lightweight agentic framework inspired by OpenClaw, and QuPAINT, one of the first Physics-Aware Instruction Multi-modal platforms for Quantum Material Discovery. This allows accessibility to the lab floor via a variety of messaging channels. OpenQlaw allows the core Large Language Model (LLM) agent to orchestrate a domain-expert MLLM,with QuPAINT, as a specialized node, successfully decoupling visual identification from reasoning and deterministic image rendering. By parsing spatial data from the expert, the agent can dynamically process user queries, such as performing scale-aware physical computation or generating isolated visual annotations, and answer in a naturalistic manner. Crucially, the system features a persistent memory that enables the agent to save physical scale ratios (e.g., 1 pixel = 0.25 μm) for area computations and store sample preparation methods for efficacy comparison. The application of an agentic architecture, together with the extension that uses the core agent as an orchestrator for domain-specific experts, transforms isolated inferences into a context-aware assistant capable of accelerating high-throughput device fabrication.

CVFeb 19
QuPAINT: Physics-Aware Instruction Tuning Approach to Quantum Material Discovery

Xuan-Bac Nguyen, Hoang-Quan Nguyen, Sankalp Pandey et al.

Characterizing two-dimensional quantum materials from optical microscopy images is challenging due to the subtle layer-dependent contrast, limited labeled data, and significant variation across laboratories and imaging setups. Existing vision models struggle in this domain since they lack physical priors and cannot generalize to new materials or hardware conditions. This work presents a new physics-aware multimodal framework that addresses these limitations from both the data and model perspectives. We first present Synthia, a physics-based synthetic data generator that simulates realistic optical responses of quantum material flakes under thin-film interference. Synthia produces diverse and high-quality samples, helping reduce the dependence on expert manual annotation. We introduce QMat-Instruct, the first large-scale instruction dataset for quantum materials, comprising multimodal, physics-informed question-answer pairs designed to teach Multimodal Large Language Models (MLLMs) to understand the appearance and thickness of flakes. Then, we propose Physics-Aware Instruction Tuning (QuPAINT), a multimodal architecture that incorporates a Physics-Informed Attention module to fuse visual embeddings with optical priors, enabling more robust and discriminative flake representations. Finally, we establish QF-Bench, a comprehensive benchmark spanning multiple materials, substrates, and imaging settings, offering standardized protocols for fair and reproducible evaluation.

45.3LGMay 13
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

Hoang-Quan Nguyen, Sankalp Pandey, Khoa Luu

Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent updates, but their memory is typically formed via additive or linear transitions, which can limit their ability to capture complex global interactions across tokens. In this work, we introduce one of the first studies to leverage the superposition property of quantum systems to enhance state-based sequence modeling. In particular, we propose Quantum Long-Attention Memory (QLAM), a hybrid quantum-classical memory mechanism that can be viewed as a quantum extension of state-space models. Instead of maintaining a classical latent state updated through additive dynamics, QLAM represents the hidden state as a quantum state whose amplitudes encode a superposition of historical information. The state evolves through parameterized quantum circuits conditioned on the input, enabling a non-classical, globally update mechanism. In this way, QLAM preserves the recurrent and linear-time structure of SSMs while fundamentally enriching the memory representation through quantum superposition. Unlike attention mechanisms that explicitly compute pairwise interactions, QLAM implicitly captures global dependencies through the evolution of the quantum state, and retrieves task-relevant information via query-dependent measurements. We evaluate QLAM on sequential variants of standard image classification benchmarks, including sMNIST, sFashion-MNIST, and sCIFAR-10, where images are flattened into token sequences. Across all tasks, QLAM consistently improves over recurrent baselines and transformer-based models.

CVNov 20, 2024
Quantum-Brain: Quantum-Inspired Neural Network Approach to Vision-Brain Understanding

Hoang-Quan Nguyen, Xuan-Bac Nguyen, Hugh Churchill et al.

Vision-brain understanding aims to extract semantic information about brain signals from human perceptions. Existing deep learning methods for vision-brain understanding are usually introduced in a traditional learning paradigm missing the ability to learn the connectivities between brain regions. Meanwhile, the quantum computing theory offers a new paradigm for designing deep learning models. Motivated by the connectivities in the brain signals and the entanglement properties in quantum computing, we propose a novel Quantum-Brain approach, a quantum-inspired neural network, to tackle the vision-brain understanding problem. To compute the connectivity between areas in brain signals, we introduce a new Quantum-Inspired Voxel-Controlling module to learn the impact of a brain voxel on others represented in the Hilbert space. To effectively learn connectivity, a novel Phase-Shifting module is presented to calibrate the value of the brain signals. Finally, we introduce a new Measurement-like Projection module to present the connectivity information from the Hilbert space into the feature space. The proposed approach can learn to find the connectivities between fMRI voxels and enhance the semantic information obtained from human perceptions. Our experimental results on the Natural Scene Dataset benchmarks illustrate the effectiveness of the proposed method with Top-1 accuracies of 95.1% and 95.6% on image and brain retrieval tasks and an Inception score of 95.3% on fMRI-to-image reconstruction task. Our proposed quantum-inspired network brings a potential paradigm to solving the vision-brain problems via the quantum computing theory.

CVFeb 14, 2025
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding

Thanh-Dat Truong, Hoang-Quan Nguyen, Xuan-Bac Nguyen et al.

Multimodal conversational generative AI has shown impressive capabilities in various vision and language understanding through learning massive text-image data. However, current conversational models still lack knowledge about visual insects since they are often trained on the general knowledge of vision-language data. Meanwhile, understanding insects is a fundamental problem in precision agriculture, helping to promote sustainable development in agriculture. Therefore, this paper proposes a novel multimodal conversational model, Insect-LLaVA, to promote visual understanding in insect-domain knowledge. In particular, we first introduce a new large-scale Multimodal Insect Dataset with Visual Insect Instruction Data that enables the capability of learning the multimodal foundation models. Our proposed dataset enables conversational models to comprehend the visual and semantic features of the insects. Second, we propose a new Insect-LLaVA model, a new general Large Language and Vision Assistant in Visual Insect Understanding. Then, to enhance the capability of learning insect features, we develop an Insect Foundation Model by introducing a new micro-feature self-supervised learning with a Patch-wise Relevant Attention mechanism to capture the subtle differences among insect images. We also present Description Consistency loss to improve micro-feature learning via text descriptions. The experimental results evaluated on our new Visual Insect Question Answering benchmarks illustrate the effective performance of our proposed approach in visual insect understanding and achieve State-of-the-Art performance on standard benchmarks of insect-related tasks.

QUANT-PHJul 7, 2025
QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks

Hoang-Quan Nguyen, Xuan-Bac Nguyen, Sankalp Pandey et al.

Quantum machine learning (QML) has emerged as a promising direction in the noisy intermediate-scale quantum (NISQ) era, offering computational and memory advantages by harnessing superposition and entanglement. However, QML models often face challenges in scalability and expressiveness due to hardware constraints. In this paper, we propose quantum mixture of experts (QMoE), a novel quantum architecture that integrates the mixture of experts (MoE) paradigm into the QML setting. QMoE comprises multiple parameterized quantum circuits serving as expert models, along with a learnable quantum routing mechanism that selects and aggregates specialized quantum experts per input. The empirical results from the proposed QMoE on quantum classification tasks demonstrate that it consistently outperforms standard quantum neural networks, highlighting its effectiveness in learning complex data patterns. Our work paves the way for scalable and interpretable quantum learning frameworks.

CVMay 2, 2024
Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Hoang-Quan Nguyen, Thanh-Dat Truong, Khoa Luu

Action recognition has become one of the popular research topics in computer vision. There are various methods based on Convolutional Networks and self-attention mechanisms as Transformers to solve both spatial and temporal dimensions problems of action recognition tasks that achieve competitive performances. However, these methods lack a guarantee of the correctness of the action subject that the models give attention to, i.e., how to ensure an action recognition model focuses on the proper action subject to make a reasonable action prediction. In this paper, we propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos using Directed Gromov-Wasserstein Discrepancy. Furthermore, our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets. Therefore, the contributions in this work are three-fold. Firstly, we introduce the multi-view attention consistency to solve the problem of reasonable prediction in action recognition. Secondly, we define a new metric for multi-view consistent attention using Directed Gromov-Wasserstein Discrepancy. Thirdly, we built an action recognition model based on Video Transformers and Neural Radiance Fields. Compared to the recent action recognition methods, the proposed approach achieves state-of-the-art results on three large-scale datasets, i.e., Jester, Something-Something V2, and Kinetics-400.

CVJul 7, 2025
$\varphi$-Adapt: A Physics-Informed Adaptation Learning Approach to 2D Quantum Material Discovery

Hoang-Quan Nguyen, Xuan Bac Nguyen, Sankalp Pandey et al.

Characterizing quantum flakes is a critical step in quantum hardware engineering because the quality of these flakes directly influences qubit performance. Although computer vision methods for identifying two-dimensional quantum flakes have emerged, they still face significant challenges in estimating flake thickness. These challenges include limited data, poor generalization, sensitivity to domain shifts, and a lack of physical interpretability. In this paper, we introduce one of the first Physics-informed Adaptation Learning approaches to overcome these obstacles. We focus on two main issues, i.e., data scarcity and generalization. First, we propose a new synthetic data generation framework that produces diverse quantum flake samples across various materials and configurations, reducing the need for time-consuming manual collection. Second, we present $\varphi$-Adapt, a physics-informed adaptation method that bridges the performance gap between models trained on synthetic data and those deployed in real-world settings. Experimental results show that our approach achieves state-of-the-art performance on multiple benchmarks, outperforming existing methods. Our proposed approach advances the integration of physics-based modeling and domain adaptation. It also addresses a critical gap in leveraging synthesized data for real-world 2D material analysis, offering impactful tools for deep learning and materials science communities.

QUANT-PHJun 2, 2024
Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits

Hoang-Quan Nguyen, Xuan Bac Nguyen, Samuel Yen-Chi Chen et al.

Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the PQCs on practical tasks, the errors caused by the quantum noise downgrade the performance when running on real quantum computers. In particular, when the quantum state is transformed through multiple quantum circuit layers, the effect of the quantum noise happens cumulatively and becomes closer to the maximally mixed state or complete noise. This paper studies the relationship between the quantum noise and the diffusion model. Then, we propose a novel diffusion-inspired learning approach to mitigate the quantum noise in the PQCs and reduce the error for specific tasks. Through our experiments, we illustrate the efficiency of the learning strategy and achieve state-of-the-art performance on classification tasks in the quantum noise scenarios.

CVMay 25, 2023
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments

Thanh-Dat Truong, Hoang-Quan Nguyen, Bhiksha Raj et al.

Continual semantic segmentation aims to learn new classes while maintaining the information from the previous classes. Although prior studies have shown impressive progress in recent years, the fairness concern in the continual semantic segmentation needs to be better addressed. Meanwhile, fairness is one of the most vital factors in deploying the deep learning model, especially in human-related or safety applications. In this paper, we present a novel Fairness Continual Learning approach to the semantic segmentation problem. In particular, under the fairness objective, a new fairness continual learning framework is proposed based on class distributions. Then, a novel Prototypical Contrastive Clustering loss is proposed to address the significant challenges in continual learning, i.e., catastrophic forgetting and background shift. Our proposed loss has also been proven as a novel, generalized learning paradigm of knowledge distillation commonly used in continual learning. Moreover, the proposed Conditional Structural Consistency loss further regularized the structural constraint of the predicted segmentation. Our proposed approach has achieved State-of-the-Art performance on three standard scene understanding benchmarks, i.e., ADE20K, Cityscapes, and Pascal VOC, and promoted the fairness of the segmentation model.