Bo Zhang

CV
h-index44
5papers
342citations
Novelty48%
AI Score34

5 Papers

22.2CVAug 19, 2023Code
ControlCom: Controllable Image Composition using Diffusion Model

Bo Zhang, Yuxuan Duan, Jun Lan et al.

Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images, considering their great potential in image generation. However, they suffer from lack of controllability on foreground attributes and poor preservation of foreground identity. To address these challenges, we propose a controllable image composition method that unifies four tasks in one diffusion model: image blending, image harmonization, view synthesis, and generative composition. Meanwhile, we design a self-supervised training framework coupled with a tailored pipeline of training data preparation. Moreover, we propose a local enhancement module to enhance the foreground details in the diffusion model, improving the foreground fidelity of composite images. The proposed method is evaluated on both public benchmark and real-world data, which demonstrates that our method can generate more faithful and controllable composite images than existing approaches. The code and model will be available at https://github.com/bcmi/ControlCom-Image-Composition.

13.0LGNov 16, 2023
A Speed Odyssey for Deployable Quantization of LLMs

Qingyuan Li, Ran Meng, Yiduo Li et al.

The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computation which might not be supported by the mainstream hardware, or involve sophisticated algorithms that introduce extra computation or memory access overhead. We argue that pursuing a hardware-centric approach in the construction of quantization algorithms is crucial. In this regard, we are driven to build our compression method on top of hardware awareness, eliminating impractical algorithm choices while maximizing the benefit of hardware acceleration. Our method, OdysseyLLM, comes with a novel W4A8 kernel implementation called FastGEMM and a combined recipe of quantization strategies. Extensive experiments manifest the superiority of our W4A8 method which brings the actual speed boosting up to \textbf{4$\times$} compared to Hugging Face FP16 inference and \textbf{2.23$\times$} vs. the state-of-the-art inference engine TensorRT-LLM in FP16, and \textbf{1.45$\times$} vs. TensorRT-LLM in INT8, yet without substantially harming the performance.

38.5CVFeb 6, 2024Code
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Xiangxiang Chu, Limeng Qiao, Xinyu Zhang et al.

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .

31.2CVDec 28, 2023Code
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

Xiangxiang Chu, Limeng Qiao, Xinyang Lin et al.

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality interaction via an efficient projector. We evaluate MobileVLM on several typical VLM benchmarks. Our models demonstrate on par performance compared with a few much larger models. More importantly, we measure the inference speed on both a Qualcomm Snapdragon 888 CPU and an NVIDIA Jeston Orin GPU, and we obtain state-of-the-art performance of 21.5 tokens and 65.3 tokens per second, respectively. Our code will be made available at: https://github.com/Meituan-AutoML/MobileVLM.

1.2NAJun 1, 2011
Optimal error estimates and energy conservation identities of the ADI-FDTD scheme on staggered grids for 3D Maxwell's equations

Liping Gao, Bo Zhang

This paper is concerned with the optimal error estimates and energy conservation properties of the alternating direction implicit finite-difference time-domain (ADI-FDTD) method which is a popular scheme for solving the 3D Maxwell equations. Precisely, for the case with a perfectly electric conducting (PEC) boundary condition we establish the optimal second-order error estimates in both space and time in the discrete $H^1$-norm for the ADI-FDTD scheme and prove the approximate divergence preserving property that if the divergence of the initial electric and magnetic fields are zero then the discrete $L^2$-norm of the discrete divergence of the ADI-FDTD solution is approximately zero with the second-order accuracy in both space and time. A key ingredient is two new discrete energy norms which are second-order in time perturbations of two new energy conservation laws for the Maxwell equations introduced in this paper. Furthermore, we prove that, in addition to two known discrete energy identities which are second-order in time perturbations of two known energy conservation laws, the ADI-FDTD scheme also satisfies two new discrete energy identities which are second-order in time perturbations of the two new energy conservation laws. This means that the ADI-FDTD scheme is unconditionally stable under the four discrete energy norms. Experimental results are presented which confirm the theoretical results.