Williem

CV
4papers
25citations
Novelty61%
AI Score28

4 Papers

CVNov 1, 2023
On Manipulating Scene Text in the Wild with Diffusion Models

Joshua Santoso, Christian Simon, Williem

Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall impacts image editing tasks that require information preservation e.g., scene text editing. As a desired result, the model must show the capability to replace the text on the source image to the target text while preserving the details e.g., color, font size, and background. To leverage the potential of diffusion models, in this work, we introduce Diffusion-BasEd Scene Text manipulation Network so-called DBEST. Specifically, we design two adaptation strategies, namely one-shot style adaptation and text-recognition guidance. In experiments, we thoroughly assess and compare our proposed method against state-of-the-arts on various scene text datasets, then provide extensive ablation studies for each granularity to analyze our performance gain. Also, we demonstrate the effectiveness of our proposed method to synthesize scene text indicated by competitive Optical Character Recognition (OCR) accuracy. Our method achieves 94.15% and 98.12% on COCO-text and ICDAR2013 datasets for character-level evaluation.

CVFeb 28, 2024
Ef-QuantFace: Streamlined Face Recognition with Small Data and Low-Bit Precision

William Gazali, Jocelyn Michelle Kho, Joshua Santoso et al.

In recent years, model quantization for face recognition has gained prominence. Traditionally, compressing models involved vast datasets like the 5.8 million-image MS1M dataset as well as extensive training times, raising the question of whether such data enormity is essential. This paper addresses this by introducing an efficiency-driven approach, fine-tuning the model with just up to 14,000 images, 440 times smaller than MS1M. We demonstrate that effective quantization is achievable with a smaller dataset, presenting a new paradigm. Moreover, we incorporate an evaluation-based metric loss and achieve an outstanding 96.15% accuracy on the IJB-C dataset, establishing a new state-of-the-art compressed model training for face recognition. The subsequent analysis delves into potential applications, emphasizing the transformative power of this approach. This paper advances model quantization by highlighting the efficiency and optimal results with small data and training time.

IVNov 23, 2019
Joint Spatial and Angular Super-Resolution from a Single Image

Andre Ivan, Williem, In Kyu Park

Synthesizing a densely sampled light field from a single image is highly beneficial for many applications. Moreover, jointly solving both angular and spatial super-resolution problem also introduces new possibilities in light field imaging. The conventional method relies on physical-based rendering and a secondary network to solve the angular super-resolution problem. In addition, pixel-based loss limits the network capability to infer scene geometry globally. In this paper, we show that both super-resolution problems can be solved jointly from a single image by proposing a single end-to-end deep neural network that does not require a physical-based approach. Two novel loss functions based on known light field domain knowledge are proposed to enable the network to preserve the spatio-angular consistency between sub-aperture images. Experimental results show that the proposed model successfully synthesizes dense high resolution light field and it outperforms the state-of-the-art method in both quantitative and qualitative criteria. The method can be generalized to arbitrary scenes, rather than focusing on a particular subject. The synthesized light field can be used for various applications, such as depth estimation and refocusing.

CVMar 29, 2019
Synthesizing a 4D Spatio-Angular Consistent Light Field from a Single Image

Andre Ivan, Williem, In Kyu Park

Synthesizing a densely sampled light field from a single image is highly beneficial for many applications. The conventional method reconstructs a depth map and relies on physical-based rendering and a secondary network to improve the synthesized novel views. Simple pixel-based loss also limits the network by making it rely on pixel intensity cue rather than geometric reasoning. In this study, we show that a different geometric representation, namely, appearance flow, can be used to synthesize a light field from a single image robustly and directly. A single end-to-end deep neural network that does not require a physical-based approach nor a post-processing subnetwork is proposed. Two novel loss functions based on known light field domain knowledge are presented to enable the network to preserve the spatio-angular consistency between sub-aperture images effectively. Experimental results show that the proposed model successfully synthesizes dense light fields and qualitatively and quantitatively outperforms the previous model . The method can be generalized to arbitrary scenes, rather than focusing on a particular class of object. The synthesized light field can be used for various applications, such as depth estimation and refocusing.