Wei-An Lin

CV
h-index29
15papers
769citations
Novelty61%
AI Score48

15 Papers

CVAug 25, 2025Code
SpotEdit: Evaluating Visually-Guided Image Editing Methods

Sara Ghazanfari, Wei-An Lin, Haitong Tian et al.

Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at https://github.com/SaraGhazanfari/SpotEdit.

LGNov 23, 2019Code
Invert and Defend: Model-based Approximate Inversion of Generative Adversarial Networks for Secure Inference

Wei-An Lin, Yogesh Balaji, Pouya Samangouei et al.

Inferring the latent variable generating a given test sample is a challenging problem in Generative Adversarial Networks (GANs). In this paper, we propose InvGAN - a novel framework for solving the inference problem in GANs, which involves training an encoder network capable of inverting a pre-trained generator network without access to any training data. Under mild assumptions, we theoretically show that using InvGAN, we can approximately invert the generations of any latent code of a trained GAN model. Furthermore, we empirically demonstrate the superiority of our inference scheme by quantitative and qualitative comparisons with other methods that perform a similar task. We also show the effectiveness of our framework in the problem of adversarial defenses where InvGAN can successfully be used as a projection-based defense mechanism. Additionally, we show how InvGAN can be used to implement reparameterization white-box attacks on projection-based defense mechanisms. Experimental validation on several benchmark datasets demonstrate the efficacy of our method in achieving improved performance on several white-box and black-box attacks. Our code is available at https://github.com/yogeshbalaji/InvGAN.

IVAug 3, 2019Code
ADN: Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction

Haofu Liao, Wei-An Lin, S. Kevin Zhou et al.

Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods that rely on synthesized metal artifacts for training. However, as synthesized data may not accurately simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that disentangles the metal artifacts from CT images in the latent space. It supports different forms of generations (artifact reduction, artifact transfer, and self-reconstruction, etc.) with specialized loss functions to obviate the need for supervision with synthesized data. Extensive experiments show that when applied to a synthesized dataset, our method addresses metal artifacts significantly better than the existing unsupervised models designed for natural image-to-image translation problems, and achieves comparable performance to existing supervised models for MAR. When applied to clinical datasets, our method demonstrates better generalization ability over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn.

IVJun 5, 2019Code
Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction

Haofu Liao, Wei-An Lin, Jianbo Yuan et al.

Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods which rely heavily on synthesized data for training. However, as synthesized data may not perfectly simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that enables different forms of generations and regularizations between the artifact-affected and artifact-free image domains to support unsupervised learning. Extensive experiments show that our method significantly outperforms the existing unsupervised models for image-to-image translation problems, and achieves comparable performance to existing supervised models on a synthesized dataset. When applied to clinical datasets, our method achieves considerable improvements over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn.

CVNov 25, 2025
MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

Chieh-Yun Chen, Zhonghao Wang, Qi Chen et al.

Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we introduce two complementary methods: MapReduce LoRA and Reward-aware Token Embedding (RaTE). MapReduce LoRA trains preference-specific LoRA experts in parallel and iteratively merges them to refine a shared base model; RaTE learns reward-specific token embeddings that compose at inference for flexible preference control. Experiments on Text-to-Image generation (Stable Diffusion 3.5 Medium and FLUX.1-dev) show improvements of 36.1%, 4.6%, and 55.7%, and 32.7%, 4.3%, and 67.1% on GenEval, PickScore, and OCR, respectively. On Text-to-Video generation (HunyuanVideo), visual and motion quality improve by 48.1% and 90.0%, respectively. On the language task, Helpful Assistant, with Llama-2 7B, helpful and harmless improve by 43.4% and 136.7%, respectively. Our framework sets a new state-of-the-art multi-preference alignment recipe across modalities.

CVJun 4, 2024
Plug-and-Play Diffusion Distillation

Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.

Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half, and only requires 1\% trainable parameters of the base model. Furthermore, once trained, our guide model can be applied to various fine-tuned, domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically, we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.

CVDec 12, 2021
Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses

Chun Pong Lau, Jiang Liu, Hossein Souri et al.

Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks. However, models trained with AT sacrifice standard accuracy and do not generalize well to novel attacks. Recent works show generalization improvement with adversarial samples under novel threat models such as on-manifold threat model or neural perceptual threat model. However, the former requires exact manifold information while the latter requires algorithm relaxation. Motivated by these considerations, we exploit the underlying manifold information with Normalizing Flow, ensuring that exact manifold assumption holds. Moreover, we propose a novel threat model called Joint Space Threat Model (JSTM), which can serve as a special case of the neural perceptual threat model that does not require additional relaxation to craft the corresponding adversarial attacks. Under JSTM, we develop novel adversarial attacks and defenses. The mixup strategy improves the standard accuracy of neural networks but sacrifices robustness when combined with AT. To tackle this issue, we propose the Robust Mixup strategy in which we maximize the adversity of the interpolated images and gain robustness and prevent overfitting. Our experiments show that Interpolated Joint Space Adversarial Training (IJSAT) achieves good performance in standard accuracy, robustness, and generalization in CIFAR-10/100, OM-ImageNet, and CIFAR-10-C datasets. IJSAT is also flexible and can be used as a data augmentation method to improve standard accuracy and combine with many existing AT approaches to improve robustness.

CVSep 5, 2020
Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks

Wei-An Lin, Chun Pong Lau, Alexander Levine et al.

Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks.

IVJan 2, 2020
Encoding Metal Mask Projection for Metal Artifact Reduction in Computed Tomography

Yuanyuan Lyu, Wei-An Lin, Haofu Liao et al.

Metal artifact reduction (MAR) in computed tomography (CT) is a notoriously challenging task because the artifacts are structured and non-local in the image domain. However, they are inherently local in the sinogram domain. Thus, one possible approach to MAR is to exploit the latter characteristic by learning to reduce artifacts in the sinogram. However, if we directly treat the metal-affected regions in sinogram as missing and replace them with the surrogate data generated by a neural network, the artifact-reduced CT images tend to be over-smoothed and distorted since fine-grained details within the metal-affected regions are completely ignored. In this work, we provide analytical investigation to the issue and propose to address the problem by (1) retaining the metal-affected regions in sinogram and (2) replacing the binarized metal trace with the metal mask projection such that the geometry information of metal implants is encoded. Extensive experiments on simulated datasets and expert evaluations on clinical images demonstrate that our novel network yields anatomically more precise artifact-reduced images than the state-of-the-art approaches, especially when metallic objects are large.

IVAug 15, 2019
Towards multi-sequence MR image recovery from undersampled k-space data

Cheng Peng, Wei-An Lin, Rama Chellappa et al.

Undersampled MR image recovery has been widely studied for accelerated MR acquisition. However, it has been mostly studied under a single sequence scenario, despite the fact that multi-sequence MR scan is common in practice. In this paper, we aim to optimize multi-sequence MR image recovery from undersampled k-space data under an overall time constraint while considering the difference in acquisition time for various sequences. We first formulate it as a constrained optimization problem and then show that finding the optimal sampling strategy for all sequences and the best recovery model at the same time is combinatorial and hence computationally prohibitive. To solve this problem, we propose a blind recovery model that simultaneously recovers multiple sequences, and an efficient approach to find proper combination of sampling strategy and recovery model. Our experiments demonstrate that the proposed method outperforms sequence-wise recovery, and sheds light on how to decide the undersampling strategy for sequences within an overall time budget.

IVAug 15, 2019
Deep Slice Interpolation via Marginal Super-Resolution, Fusion and Refinement

Cheng Peng, Wei-An Lin, Haofu Liao et al.

We propose a marginal super-resolution (MSR) approach based on 2D convolutional neural networks (CNNs) for interpolating an anisotropic brain magnetic resonance scan along the highly under-sampled direction, which is assumed to axial without loss of generality. Previous methods for slice interpolation only consider data from pairs of adjacent 2D slices. The possibility of fusing information from the direction orthogonal to the 2D slices remains unexplored. Our approach performs MSR in both sagittal and coronal directions, which provides an initial estimate for slice interpolation. The interpolated slices are then fused and refined in the axial direction for improved consistency. Since MSR consists of only 2D operations, it is more feasible in terms of GPU memory consumption and requires fewer training samples compared to 3D CNNs. Our experiments demonstrate that the proposed method outperforms traditional linear interpolation and baseline 2D/3D CNN-based approaches. We conclude by showcasing the method's practical utility in estimating brain volumes from under-sampled brain MR scans through semantic segmentation.

IVJun 29, 2019
Generative Mask Pyramid Network for CT/CBCT Metal Artifact Reduction with Joint Projection-Sinogram Correction

Haofu Liao, Wei-An Lin, Zhimin Huo et al.

A conventional approach to computed tomography (CT) or cone beam CT (CBCT) metal artifact reduction is to replace the X-ray projection data within the metal trace with synthesized data. However, existing projection or sinogram completion methods cannot always produce anatomically consistent information to fill the metal trace, and thus, when the metallic implant is large, significant secondary artifacts are often introduced. In this work, we propose to replace metal artifact affected regions with anatomically consistent content through joint projection-sinogram correction as well as adversarial learning. To handle the metallic implants of diverse shapes and large sizes, we also propose a novel mask pyramid network that enforces the mask information across the network's encoding layers and a mask fusion loss that reduces early saturation of adversarial training. Our experimental results show that the proposed projection-sinogram correction designs are effective and our method recovers information from the metal traces better than the state-of-the-art methods.

IVJun 29, 2019
DuDoNet: Dual Domain Network for CT Metal Artifact Reduction

Wei-An Lin, Haofu Liao, Cheng Peng et al.

Computed tomography (CT) is an imaging modality widely used for medical diagnosis and treatment. CT images are often corrupted by undesirable artifacts when metallic implants are carried by patients, which creates the problem of metal artifact reduction (MAR). Existing methods for reducing the artifacts due to metallic implants are inadequate for two main reasons. First, metal artifacts are structured and non-local so that simple image domain enhancement approaches would not suffice. Second, the MAR approaches which attempt to reduce metal artifacts in the X-ray projection (sinogram) domain inevitably lead to severe secondary artifact due to sinogram inconsistency. To overcome these difficulties, we propose an end-to-end trainable Dual Domain Network (DuDoNet) to simultaneously restore sinogram consistency and enhance CT images. The linkage between the sigogram and image domains is a novel Radon inversion layer that allows the gradients to back-propagate from the image domain to the sinogram domain during training. Extensive experiments show that our method achieves significant improvements over other single domain MAR approaches. To the best of our knowledge, it is the first end-to-end dual-domain network for MAR.

CVMar 10, 2019
Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation ($\text{POINT}^2$)

Haofu Liao, Wei-An Lin, Jiarui Zhang et al.

We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation ($\text{POINT}^2$). $\text{POINT}^2$ learns to establish 2D point-to-point correspondences between the pre- and intra-intervention images by tracking a set of random POIs. The 3D pose of the pre-intervention volume is then estimated through a triangulation layer. In $\text{POINT}^2$, the unified framework of the POI tracker and the triangulation layer enables learning informative 2D features and estimating 3D pose jointly. In contrast to existing approaches, $\text{POINT}^2$ only requires a single forward-pass to achieve a reliable 2D/3D registration. As the POI tracker is shift-invariant, $\text{POINT}^2$ is more robust to the initial pose of the 3D pre-intervention image. Extensive experiments on a large-scale clinical cone-beam CT (CBCT) dataset show that the proposed $\text{POINT}^2$ method outperforms the existing learning-based method in terms of accuracy, robustness and running time. Furthermore, when used as an initial pose estimator, our method also improves the robustness and speed of the state-of-the-art optimization-based approaches by ten folds.

CVMar 14, 2017
A Proximity-Aware Hierarchical Clustering of Faces

Wei-An Lin, Jun-Cheng Chen, Rama Chellappa

In this paper, we propose an unsupervised face clustering algorithm called "Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local structure of deep representations. In the proposed method, a similarity measure between deep features is computed by evaluating linear SVM margins. SVMs are trained using nearest neighbors of sample data, and thus do not require any external training data. Clusters are then formed by thresholding the similarity scores. We evaluate the clustering performance using three challenging unconstrained face datasets, including Celebrity in Frontal-Profile (CFP), IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3) datasets. Experimental results demonstrate that the proposed approach can achieve significant improvements over state-of-the-art methods. Moreover, we also show that the proposed clustering algorithm can be applied to curate a set of large-scale and noisy training dataset while maintaining sufficient amount of images and their variations due to nuisance factors. The face verification performance on JANUS CS3 improves significantly by finetuning a DCNN model with the curated MS-Celeb-1M dataset which contains over three million face images.