Jiaojiao Ye

CV
h-index24
4papers
18citations
Novelty43%
AI Score32

4 Papers

CVOct 7, 2025
Data Factory with Minimal Human Effort Using VLMs

Jiaojiao Ye, Jiaxing Zhong, Qian Xie et al.

Generating enough and diverse data through augmentation offers an efficient solution to the time-consuming and labour-intensive process of collecting and annotating pixel-wise images. Traditional data augmentation techniques often face challenges in manipulating high-level semantic attributes, such as materials and textures. In contrast, diffusion models offer a robust alternative, by effectively utilizing text-to-image or image-to-image transformation. However, existing diffusion-based methods are either computationally expensive or compromise on performance. To address this issue, we introduce a novel training-free pipeline that integrates pretrained ControlNet and Vision-Language Models (VLMs) to generate synthetic images paired with pixel-level labels. This approach eliminates the need for manual annotations and significantly improves downstream tasks. To improve the fidelity and diversity, we add a Multi-way Prompt Generator, Mask Generator and High-quality Image Selection module. Our results on PASCAL-5i and COCO-20i present promising performance and outperform concurrent work for one-shot semantic segmentation.

CVDec 30, 2024
PQD: Post-training Quantization for Efficient Diffusion Models

Jiaojiao Ye, Zhen Wang, Linnan Jiang

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread adoption. In this paper, we propose a novel post-training quantization for diffusion models (PQD), which is a time-aware optimization framework for diffusion models based on post-training quantization. The proposed framework optimizes the inference process by selecting representative samples and conducting time-aware calibration. Experimental results show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner, achieving a few FID change on ImageNet for unconditional image generation. Our approach demonstrates compatibility and can also be applied to 512x512 text-guided image generation for the first time.

CVDec 18, 2021
An effective coaxiality measurement for twist drill based on line structured light sensor

Ailing Cheng, Jiaojiao Ye, Fei Yang et al.

Aiming at the accurate and effective coaxiality measurement for twist drill with irregular surface, an optical measurement mechanism is proposed in this paper. First, A high-precision rotation instrument based on four core units is designed, which can obtain the 3-D point cloud data of full angle for the twist drill. Second, in the data processing stage, an improved robust Gaussian mixture model is established for accurate and rapid blade back segmentation. To improve measurement efficiency, a rapid reconstruction method of the twist drill axis based on orthogonal synthesis is provided to locate the axial position of the maximum deviation from the benchmark by utilizing the extracted blade back data. Finally, by calculating the maximum radial Euclidean distance from the benchmark, the coaxiality error of the twist drill is obtained. Comparing with other measurement methods, experimental results show that our proposed method is effective with high precision of 3 um and high efficiency of less than 3 s/pc. The result demonstrate that the proposed method is effective, robust and automatic, it can be applied in many actual industrial scene.

CVFeb 18, 2019
A Generative Map for Image-based Camera Localization

Mingpan Guo, Stefan Matthes, Jiaojiao Ye et al.

In image-based camera localization systems, information about the environment is usually stored in some representation, which can be referred to as a map. Conventionally, most maps are built upon hand-crafted features. Recently, neural networks have attracted attention as a data-driven map representation, and have shown promising results in visual localization. However, these neural network maps are generally hard to interpret by human. A readable map is not only accessible to humans, but also provides a way to be verified when the ground truth pose is unavailable. To tackle this problem, we propose Generative Map, a new framework for learning human-readable neural network maps, by combining a generative model with the Kalman filter, which also allows it to incorporate additional sensor information such as stereo visual odometry. For evaluation, we use real world images from the 7-Scenes and Oxford RobotCar datasets. We demonstrate that our Generative Map can be queried with a pose of interest from the test sequence to predict an image, which closely resembles the true scene. For localization, we show that Generative Map achieves comparable performance with current regression models. Moreover, our framework is trained completely from scratch, unlike regression models which rely on large ImageNet pretrained networks.