Phan Xuan Tan

MM
h-index11
10papers
57citations
Novelty43%
AI Score53

10 Papers

CVSep 27, 2024Code
Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models

Nguyen Gia Bach, Chanh Minh Tran, Eiji Kamioka et al.

Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: https://github.com/bachzz/UW-DiffPhys

21.4MAMay 26
Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure

Ujwal Kumar, Arth Singh, Hershraj Niranjani et al.

Frontier LLM agents engage in blackmail, sabotage, and document leaks under goal conflicts in agentic settings, exposing limitations of alignment methods built around single-agent or cooperative assumptions. Recent work shows LLM-guided evolutionary search can discover effective cooperative constitutions, but two properties of the adversarial setting remain uncharacterized: whether the fitness function actually induces adversarial pressure, and whether the LLM mutation operator behaves reliably under adversarial-specialist objectives. We study adversarial constitutional co-evolution (Blue cooperators vs. Red free-riders, 30 generations) across a Public Goods Game (PGG) and a spatial grid-world. Three findings: (1) in the PGG, both factions converge to a near-parity equilibrium at S approximately 0.78, robust across tested multipliers m in {1.2, 1.5, 2.0, 3.0}; (2) in independently scored environments, per-faction scoring leaves outcomes statistically uncoupled, with corr(S_B, S_R) = +0.088, and produces no adversarial pressure; a score-advantage fitness target S_own - S_opp restores it; (3) under pure-adversary fitness, evaluation seed count K controls mode regression: K = 2 regresses, while K = 5 sustains a strong specialist for all 30 generations. Adversarial co-evolution of natural-language constitutions is feasible, but only under coupled fitness and adequate evaluation budget; the evolved Red constitutions serve as interpretable red-team artifacts for testing future cooperative designs.

LGJan 29
Memorization Control in Diffusion Models from Denoising-centric Perspective

Thuy Phuong Vu, Mai Viet Hoang Do, Minhhuy Le et al.

Controlling memorization in diffusion models is critical for applications that require generated data to closely match the training distribution. Existing approaches mainly focus on data centric or model centric modifications, treating the diffusion model as an isolated predictor. In this paper, we study memorization in diffusion models from a denoising centric perspective. We show that uniform timestep sampling leads to unequal learning contributions across denoising steps due to differences in signal to noise ratio, which biases training toward memorization. To address this, we propose a timestep sampling strategy that explicitly controls where learning occurs along the denoising trajectory. By adjusting the width of the confidence interval, our method provides direct control over the memorization generalization trade off. Experiments on image and 1D signal generation tasks demonstrate that shifting learning emphasis toward later denoising steps consistently reduces memorization and improves distributional alignment with training data, validating the generality and effectiveness of our approach.

CVOct 19, 2025Code
Region in Context: Text-condition Image editing with Human-like semantic reasoning

Thuy Phuong Vu, Dinh-Cuong Hoang, Minhhuy Le et al.

Recent research has made significant progress in localizing and editing image regions based on text. However, most approaches treat these regions in isolation, relying solely on local cues without accounting for how each part contributes to the overall visual and semantic composition. This often results in inconsistent edits, unnatural transitions, or loss of coherence across the image. In this work, we propose Region in Context, a novel framework for text-conditioned image editing that performs multilevel semantic alignment between vision and language, inspired by the human ability to reason about edits in relation to the whole scene. Our method encourages each region to understand its role within the global image context, enabling precise and harmonized changes. At its core, the framework introduces a dual-level guidance mechanism: regions are represented with full-image context and aligned with detailed region-level descriptions, while the entire image is simultaneously matched to a comprehensive scene-level description generated by a large vision-language model. These descriptions serve as explicit verbal references of the intended content, guiding both local modifications and global structure. Experiments show that it produces more coherent and instruction-aligned results. Code is available at: https://github.com/thuyvuphuong/Region-in-Context.git

40.1MAMay 9
Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

Hershraj Niranjani, Ujwal Kumar, Phan Xuan Tan

Multi-agent AI systems need behavioral constitutions, but it is unresolved whether such rules should emerge internally through agent self-governance or be discovered externally through optimization. We present the first controlled comparison of internal deliberation and external evolution across three social environments: a coordination grid-world, an iterated public goods game, and a bilateral trading market. Across 180 simulation runs, evolution significantly outperforms deliberation in collective-action settings (p < 0.01), while neither method improves outcomes in bilateral trading. A multiplier ablation reveals that evolution's advantage inverts when incentives shift: at pool multiplier (m = 0.75) the evolved constitution forces value-destroying cooperation and becomes the worst-performing method. Notably, no deliberation run across thirty trials ever proposed punishment -- the canonical cooperation-sustaining mechanism evolution reliably discovers -- suggesting external optimization wins on peaks while internal self-governance trades peaks for structural responsiveness.

LGFeb 1, 2025
How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers

Antonio-Gabriel Chacón Menke, Phan Xuan Tan

Recent incidents highlight safety risks in Large Language Models (LLMs), motivating research into alignment methods like Constitutional AI (CAI). This paper explores CAI's self-critique mechanism on small, uncensored 7-9B parameter models: DeepSeek-R1-8B, Gemma-2-9B, Llama 3.1-8B, and Qwen2.5-7B. We show that while Llama-based models exhibited significant harm reduction through self-critique, other architectures demonstrated less improvement in harm detection after abliteration. These results suggest CAI's effectiveness may vary depending on model architecture and reasoning capabilities.

AIOct 20, 2025
Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety

Antonio-Gabriel Chacón Menke, Phan Xuan Tan, Eiji Kamioka

Recent work has highlighted the importance of monitoring chain-of-thought reasoning for AI safety; however, current approaches that analyze textual reasoning steps can miss subtle harmful patterns and may be circumvented by models that hide unsafe reasoning. We present a sentence-level labeled dataset that enables activation-based monitoring of safety behaviors during LLM reasoning. Our dataset contains reasoning sequences with sentence-level annotations of safety behaviors such as expression of safety concerns or speculation on user intent, which we use to extract steering vectors for detecting and influencing these behaviors within model activations. The dataset fills a key gap in safety research: while existing datasets label reasoning holistically, effective application of steering vectors for safety monitoring could be improved by identifying precisely when specific behaviors occur within reasoning chains. We demonstrate the dataset's utility by extracting representations that both detect and steer safety behaviors in model activations, showcasing the potential of activation-level techniques for improving safety oversight on reasoning. Content Warning: This paper discusses AI safety in the context of harmful prompts and may contain references to potentially harmful content.

MMMar 20, 2020
Continuous QoE Prediction Based on WaveNet

Phan Xuan Tan, Tho Nguyen Duc, Chanh Minh Tran et al.

Continuous QoE prediction is crucial in the purpose of maximizing viewer satisfaction, by which video service providers could improve the revenue. Continuously predicting QoE is challenging since it requires QoE models that are capable of capturing the complex dependencies among QoE influence factors. The existing approaches that utilize Long-Short-Term-Memory (LSTM) network successfully model such long-term dependencies, providing the superior QoE prediction performance. However, the inherent drawback of sequential computing of LSTM will result in high computational cost in training and prediction tasks. Recently, WaveNet, a deep neural network for generating raw audio waveform, has been introduced. Immediately, it gains a great attention since it successfully leverages the characteristic of parallel computing of causal convolution and dilated convolution to deal with time-series data (e.g., audio signal). Being inspired by the success of WaveNet, in this paper, we propose WaveNet-based QoE model for continuous QoE prediction in video streaming services. The model is trained and tested upon on two publicly available databases, namely, LFOVIA Video QoE and LIVE Mobile Stall Video II. The experimental results demonstrate that the proposed model outperforms the baselines models in terms of processing time, while maintaining sufficient accuracy.

MMMar 19, 2020
FAURAS: A Proxy-based Framework for Ensuring the Fairness of Adaptive Video Streaming over HTTP/2 Server Push

Chanh Minh Tran, Tho Nguyen Duc, Phan Xuan Tan et al.

HTTP/2 video streaming has caught a lot of attentions in the development of multimedia technologies over the last few years. In HTTP/2, the server push mechanism allows the server to deliver more video segments to the client within a single request in order to deal with the requests explosion problem. As a result, recent research efforts have been focusing on utilizing such a feature to enhance the streaming experience while reducing the request-related overhead. However, current works only optimize the performance of a single client, without necessary concerns of possible influences on other clients in the same network. When multiple streaming clients compete for a shared bandwidth in HTTP/1.1, they are likely to suffer from unfairness, which is defined as the inequality in their bitrate selections. For HTTP/1.1, existing works have proven that the network-assisted solutions are effective in solving the unfairness problem. However, the feasibility of utilizing such an approach for the HTTP/2 server push has not been investigated. Therefore, in this paper, a novel proxy-based framework is proposed to overcome the unfairness problem in adaptive streaming over HTTP/2 with the server push. Experimental results confirm the outperformance of the proposed framework in ensuring the fairness, assisting the clients to avoid rebuffering events and lower bitrate degradation amplitude, while maintaining the mechanism of the server push feature.

MMMar 19, 2020
Convolutional Neural Networks for Continuous QoE Prediction in Video Streaming Services

Tho Nguyen Duc, Chanh Minh Tran, Phan Xuan Tan et al.

In video streaming services, predicting the continuous user's quality of experience (QoE) plays a crucial role in delivering high quality streaming contents to the user. However, the complexity caused by the temporal dependencies in QoE data and the non-linear relationships among QoE influence factors has introduced challenges to continuous QoE prediction. To deal with that, existing studies have utilized the Long Short-Term Memory model (LSTM) to effectively capture such complex dependencies, resulting in excellent QoE prediction accuracy. However, the high computational complexity of LSTM, caused by the sequential processing characteristic in its architecture, raises a serious question about its performance on devices with limited computational power. Meanwhile, Temporal Convolutional Network (TCN), a variation of convolutional neural networks, has recently been proposed for sequence modeling tasks (e.g., speech enhancement), providing a superior prediction performance over baseline methods including LSTM in terms of prediction accuracy and computational complexity. Being inspired of that, in this paper, an improved TCN-based model, namely CNN-QoE, is proposed for continuously predicting the QoE, which poses characteristics of sequential data. The proposed model leverages the advantages of TCN to overcome the computational complexity drawbacks of LSTM-based QoE models, while at the same time introducing the improvements to its architecture to improve QoE prediction accuracy. Based on a comprehensive evaluation, we demonstrate that the proposed CNN-QoE model can reach the state-of-the-art performance on both personal computers and mobile devices, outperforming the existing approaches.