Jingwen Dai

4papers

148citations

Novelty54%

AI Score43

Ranked #79,632 of 205,806 authors (top 39%)#27,359 in CV (top 46%)

4 Papers

97.8NAMay 27

An efficient and stable diffusion generated method for quadrilateral mesh generation in general domains

Jingwen Dai, Zhonghua Qiao, Dong Wang

This paper introduces a novel, robust, and computationally efficient framework for high-quality quadrilateral mesh generation on general two-dimensional domains. The core of the proposed approach is a novel method for computing cross fields by minimizing a modified and relaxed Ginzburg--Landau-type energy functional. A key innovation is the extension of the problem from the original, potentially complex domain to a larger regular computational domain. This extension transforms the central computational procedure into an iterative scheme that requires only two straightforward and efficient operations: linear diffusion solved globally via the Fast Fourier Transform (FFT) and point-wise normalization. Notably, our method eliminates the conventional need for generating an intermediate triangular mesh or solving complex nonlinear optimization problems on the irregular domain. We provide a rigorous theoretical analysis, proving that the proposed iterative algorithm guarantees unconditional monotonic decay of the objective functional. Comprehensive numerical experiments demonstrate the method's robustness across a wide range of complex geometries, its significant computational efficiency afforded by the FFT-based diffusion, and its consistent generation of high-quality quadrilateral meshes. This work presents a reliable and theoretically sound alternative to existing mesh generation techniques, with strong potential for practical applications in scientific computing.

CVMay 28, 2021

DeepTag: A General Framework for Fiducial Marker Design and Detection

Zhuming Zhang, Yongtao Hu, Guoxing Yu et al.

A fiducial marker system usually consists of markers, a detection algorithm, and a coding system. The appearance of markers and the detection robustness are generally limited by the existing detection algorithms, which are hand-crafted with traditional low-level image processing techniques. Furthermore, a sophisticatedly designed coding system is required to overcome the shortcomings of both markers and detection algorithms. To improve the flexibility and robustness in various applications, we propose a general deep learning based framework, DeepTag, for fiducial marker design and detection. DeepTag not only supports detection of a wide variety of existing marker families, but also makes it possible to design new marker families with customized local patterns. Moreover, we propose an effective procedure to synthesize training data on the fly without manual annotations. Thus, DeepTag can easily adapt to existing and newly-designed marker families. To validate DeepTag and existing methods, beside existing datasets, we further collect a new large and challenging dataset where markers are placed in different view distances and angles. Experiments show that DeepTag well supports different marker families and greatly outperforms the existing methods in terms of both detection robustness and pose accuracy. Both code and dataset are available at https://herohuyongtao.github.io/research/publications/deep-tag/.

CVAug 5, 2019

TopoTag: A Robust and Scalable Topological Fiducial Marker System

Guoxing Yu, Yongtao Hu, Jingwen Dai

Fiducial markers have been playing an important role in augmented reality (AR), robot navigation, and general applications where the relative pose between a camera and an object is required. Here we introduce TopoTag, a robust and scalable topological fiducial marker system, which supports reliable and accurate pose estimation from a single image. TopoTag uses topological and geometrical information in marker detection to achieve higher robustness. Topological information is extensively used for 2D marker detection, and further corresponding geometrical information for ID decoding. Robust 3D pose estimation is achieved by taking advantage of all TopoTag vertices. Without sacrificing bits for higher recall and precision like previous systems, TopoTag can use full bits for ID encoding. TopoTag supports tens of thousands unique IDs and easily extends to millions of unique tags resulting in massive scalability. We collected a large test dataset including in total 169,713 images for evaluation, involving in-plane and out-of-plane rotation, image blur, different distances and various backgrounds, etc. Experiments on the dataset and real indoor and outdoor scene tests with a rolling shutter camera both show that TopoTag significantly outperforms previous fiducial marker systems in terms of various metrics, including detection accuracy, vertex jitter, pose jitter and accuracy, etc. In addition, TopoTag supports occlusion as long as the main tag topological structure is maintained and allows for flexible shape design where users can customize internal and external marker shapes. Code for our marker design/generation, marker detection, and dataset are available at http://herohuyongtao.github.io/research/publications/topo-tag/.

CVJul 17, 2015

Deep Multimodal Speaker Naming

Yongtao Hu, Jimmy Ren, Jingwen Dai et al.

Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online.