AINov 11, 2022Code
What's the Situation with Intelligent Mesh Generation: A Survey and PerspectivesNa Lei, Zezeng Li, Zebin Xu et al.
Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at \url{https://github.com/xzb030/IMG_Survey}.
CVJul 21, 2023Code
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal TransportZezeng Li, ShengHao Li, Zhanpeng Wang et al.
Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT
CVJul 19, 2024
HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D GenerationZezeng Li, Weimin Wang, Yuming Zhao et al.
Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport(SOT). However, in high-dimensional situations, solving the SOT remains a challenge. To obtain the SOT map for high-dimensional features obtained from CLIP encoding of two modalities, we mathematically formulate and derive the solution based on Villani's theorem, which can directly align two hyper-sphere distributions without manifold exponential maps. Furthermore, we implement it by leveraging input convex neural networks (ICNNs) for the optimal Kantorovich potential. With the optimally mapped features, a diffusion-based generator is utilized to decode them into 3D shapes. Extensive quantitative and qualitative comparisons with state-of-the-art methods demonstrate the superiority of HOTS3D for text-to-3D generation, especially in the consistency with text semantics.
CEDec 16, 2025
A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering SimulationSteven Owen, Nathan Brown, Nikos Chrisochoides et al.
Artificial intelligence is beginning to reduce the manual effort in the CAD-to-mesh pipeline. Written for meshing and geometry practitioners with limited AI background, this survey organizes recent work by workflow step. We cover part classification and segmentation, mesh quality prediction, and defeaturing. We review AI guidance for unstructured meshing, block-structured meshing in 2D and 3D, and volumetric parameterization, including reconstruction from implicit or sampled geometry. We also discuss parallel mesh generation and scripting automation via reinforcement learning and large language models. Across these topics, AI complements established geometry and meshing algorithms rather than replacing them. We conclude with practical lessons and open challenges in data, benchmarks, and trustworthy integration.
CVOct 31, 2025
Hyperbolic Optimal TransportYan Bin Ng, Xianfeng Gu
The optimal transport (OT) problem aims to find the most efficient mapping between two probability distributions under a given cost function, and has diverse applications in many fields such as machine learning, computer vision and computer graphics. However, existing methods for computing optimal transport maps are primarily developed for Euclidean spaces and the sphere. In this paper, we explore the problem of computing the optimal transport map in hyperbolic space, which naturally arises in contexts involving hierarchical data, networks, and multi-genus Riemann surfaces. We propose a novel and efficient algorithm for computing the optimal transport map in hyperbolic space using a geometric variational technique by extending methods for Euclidean and spherical geometry to the hyperbolic setting. We also perform experiments on synthetic data and multi-genus surface models to validate the efficacy of the proposed method.
GRMay 3, 2025
OT-Talk: Animating 3D Talking Head with Optimal TransportationXinmu Wang, Xiang Gao, Xiyun Song et al.
Animating 3D head meshes using audio inputs has significant applications in AR/VR, gaming, and entertainment through 3D avatars. However, bridging the modality gap between speech signals and facial dynamics remains a challenge, often resulting in incorrect lip syncing and unnatural facial movements. To address this, we propose OT-Talk, the first approach to leverage optimal transportation to optimize the learning model in talking head animation. Building on existing learning frameworks, we utilize a pre-trained Hubert model to extract audio features and a transformer model to process temporal sequences. Unlike previous methods that focus solely on vertex coordinates or displacements, we introduce Chebyshev Graph Convolution to extract geometric features from triangulated meshes. To measure mesh dissimilarities, we go beyond traditional mesh reconstruction errors and velocity differences between adjacent frames. Instead, we represent meshes as probability measures and approximate their surfaces. This allows us to leverage the sliced Wasserstein distance for modeling mesh variations. This approach facilitates the learning of smooth and accurate facial motions, resulting in coherent and natural facial animations. Our experiments on two public audio-mesh datasets demonstrate that our method outperforms state-of-the-art techniques both quantitatively and qualitatively in terms of mesh reconstruction accuracy and temporal alignment. In addition, we conducted a user perception study with 20 volunteers to further assess the effectiveness of our approach.
CVJun 6, 2024
Global Parameterization-based Texture Space OptimizationWei Chen, Yuxue Ren, Na Lei et al.
Texture mapping is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly tight texture space. In this paper, we propose a method to optimize the texture space and produce a new texture mapping which is compact based on global parameterization. The proposed method is computationally robust and efficient. Experiments show the effectiveness of the proposed method and the potency in improving the storage and rendering efficiency.
CRMar 12, 2024
Backdoor Attack with Mode Mixture Latent ModificationHongwei Zhang, Xiaoyin Xu, Dongsheng An et al.
Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portion of the dataset with triggered images for users to train the model from scratch, or training a backdoored model alongside a triggered image generator. Both approaches require significant amount of attackable parameters for optimization to establish a connection between the trigger and the target label, which may raise suspicions as more people become aware of the existence of backdoor attacks. In this paper, we propose a backdoor attack paradigm that only requires minimal alterations (specifically, the output layer) to a clean model in order to inject the backdoor under the guise of fine-tuning. To achieve this, we leverage mode mixture samples, which are located between different modes in latent space, and introduce a novel method for conducting backdoor attacks. We evaluate the effectiveness of our method on four popular benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.
LGMay 26, 2023
Large language models improve Alzheimer's disease diagnosis using multi-modality dataYingjie Feng, Jun Wang, Xianfeng Gu et al.
In diagnosing challenging conditions such as Alzheimer's disease (AD), imaging is an important reference. Non-imaging patient data such as patient information, genetic data, medication information, cognitive and memory tests also play a very important role in diagnosis. Effect. However, limited by the ability of artificial intelligence models to mine such information, most of the existing models only use multi-modal image data, and cannot make full use of non-image data. We use a currently very popular pre-trained large language model (LLM) to enhance the model's ability to utilize non-image data, and achieved SOTA results on the ADNI dataset.
LGApr 12, 2021
Efficient Optimal Transport Algorithm by Accelerated Gradient descentDongsheng An, Na Lei, Xianfeng Gu
Optimal transport (OT) plays an essential role in various areas like machine learning and deep learning. However, computing discrete optimal transport plan for large scale problems with adequate accuracy and efficiency is still highly challenging. Recently, methods based on the Sinkhorn algorithm add an entropy regularizer to the prime problem and get a trade off between efficiency and accuracy. In this paper, we propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique. Basically, the non-smooth c-transform of the Kantorovich potential is approximated by the smooth Log-Sum-Exp function, which finally smooths the original non-smooth Kantorovich dual functional (energy). The smooth Kantorovich functional can be optimized by the fast proximal gradient algorithm (FISTA) efficiently. Theoretically, the computational complexity of the proposed method is given by $O(n^{\frac{5}{2}} \sqrt{\log n} /ε)$, which is lower than that of the Sinkhorn algorithm. Empirically, compared with the Sinkhorn algorithm, our experimental results demonstrate that the proposed method achieves faster convergence and better accuracy with the same parameter.
CVJan 11, 2020
AE-OT-GAN: Training GANs from data specific latent distributionDongsheng An, Yang Guo, Min Zhang et al.
Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.
LGFeb 8, 2019
Mode Collapse and Regularity of Optimal Transportation MapsNa Lei, Yang Guo, Dongsheng An et al.
This work builds the connection between the regularity theory of optimal transportation map, Monge-Ampère equation and GANs, which gives a theoretic understanding of the major drawbacks of GANs: convergence difficulty and mode collapse. According to the regularity theory of Monge-Ampère equation, if the support of the target measure is disconnected or just non-convex, the optimal transportation mapping is discontinuous. General DNNs can only approximate continuous mappings. This intrinsic conflict leads to the convergence difficulty and mode collapse in GANs. We test our hypothesis that the supports of real data distribution are in general non-convex, therefore the discontinuity is unavoidable using an Autoencoder combined with discrete optimal transportation map (AE-OT framework) on the CelebA data set. The testing result is positive. Furthermore, we propose to approximate the continuous Brenier potential directly based on discrete Brenier theory to tackle mode collapse. Comparing with existing method, this method is more accurate and effective.
CVOct 20, 2018
Corresponding Supine and Prone Colon Visualization Using Eigenfunction Analysis and Fold ModelingSaad Nadeem, Joseph Marino, Xianfeng Gu et al.
We present a method for registration and visualization of corresponding supine and prone virtual colonoscopy scans based on eigenfunction analysis and fold modeling. In virtual colonoscopy, CT scans are acquired with the patient in two positions, and their registration is desirable so that physicians can corroborate findings between scans. Our algorithm performs this registration efficiently through the use of Fiedler vector representation (the second eigenfunction of the Laplace-Beltrami operator). This representation is employed to first perform global registration of the two colon positions. The registration is then locally refined using the haustral folds, which are automatically segmented using the 3D level sets of the Fiedler vector. The use of Fiedler vectors and the segmented folds presents a precise way of visualizing corresponding regions across datasets and visual modalities. We present multiple methods of visualizing the results, including 2D flattened rendering and the corresponding 3D endoluminal views. The precise fold modeling is used to automatically find a suitable cut for the 2D flattening, which provides a less distorted visualization. Our approach is robust, and we demonstrate its efficiency and efficacy by showing matched views on both the 2D flattened colons and in the 3D endoluminal view. We analytically evaluate the results by measuring the distance between features on the registered colons, and we also assess our fold segmentation against 20 manually labeled datasets. We have compared our results analytically to previous methods, and have found our method to achieve superior results. We also prove the hot spots conjecture for modeling cylindrical topology using Fiedler vector representation, which allows our approach to be used for general cylindrical geometry modeling and feature extraction.
CVSep 17, 2018
LMap: Shape-Preserving Local Mappings for Biomedical VisualizationSaad Nadeem, Xianfeng Gu, Arie Kaufman
Visualization of medical organs and biological structures is a challenging task because of their complex geometry and the resultant occlusions. Global spherical and planar mapping techniques simplify the complex geometry and resolve the occlusions to aid in visualization. However, while resolving the occlusions these techniques do not preserve the geometric context, making them less suitable for mission-critical biomedical visualization tasks. In this paper, we present a shape-preserving local mapping technique for resolving occlusions locally while preserving the overall geometric context. More specifically, we present a novel visualization algorithm, LMap, for conformally parameterizing and deforming a selected local region-of-interest (ROI) on an arbitrary surface. The resultant shape-preserving local mappings help to visualize complex surfaces while preserving the overall geometric context. The algorithm is based on the robust and efficient extrinsic Ricci flow technique, and uses the dynamic Ricci flow algorithm to guarantee the existence of a local map for a selected ROI on an arbitrary surface. We show the effectiveness and efficacy of our method in three challenging use cases: (1) multimodal brain visualization, (2) optimal coverage of virtual colonoscopy centerline flythrough, and (3) molecular surface visualization.
LGSep 16, 2018
Latent Space Optimal Transport for Generative ModelsHuidong Liu, Yang Guo, Na Lei et al.
Variational Auto-Encoders enforce their learned intermediate latent-space data distribution to be a simple distribution, such as an isotropic Gaussian. However, this causes the posterior collapse problem and loses manifold structure which can be important for datasets such as facial images. A GAN can transform a simple distribution to a latent-space data distribution and thus preserve the manifold structure, but optimizing a GAN involves solving a Min-Max optimization problem, which is difficult and not well understood so far. Therefore, we propose a GAN-like method to transform a simple distribution to a data distribution in the latent space by solving only a minimization problem. This minimization problem comes from training a discriminator between a simple distribution and a latent-space data distribution. Then, we can explicitly formulate an Optimal Transport (OT) problem that computes the desired mapping between the two distributions. This means that we can transform a distribution without solving the difficult Min-Max optimization problem. Experimental results on an eight-Gaussian dataset show that the proposed OT can handle multi-cluster distributions. Results on the MNIST and the CelebA datasets validate the effectiveness of the proposed method.
QMJun 30, 2018
Classification of lung nodules in CT images based on Wasserstein distance in differential geometryMin Zhang, Qianli Ma, Chengfeng Wen et al.
Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though existing methods are limited in their sensitivity and specificity. In this work we introduced a new 3D shape analysis within the framework of differential geometry to calculate the Wasserstein distance between benign and malignant lung nodules to derive an accurate classification scheme. The Wasserstein distance between the nodules is calculated based on our new spherical optimal mass transport, this new algorithm works directly on sphere by using spherical metric, which is much more accurate and efficient than previous methods. In the process of deformation, the area-distortion factor gives a probability measure on the unit sphere, which forms the Wasserstein space. From known cases of benign and malignant lung nodules, we can calculate a unique optimal mass transport map between their correspondingly deformed Wasserstein spaces. This transportation cost defines the Wasserstein distance between them and can be used to classify new lung nodules into either the benign or malignant class. To the best of our knowledge, this is the first work that utilizes Wasserstein distance for lung nodule classification. The advantages of Wasserstein distance are it is invariant under rigid motions and scalings, thus it intrinsically measures shape distance even when the underlying shapes are of high complexity, making it well suited to classify lung nodules as they have different sizes, orientations, and appearances.
CVJun 23, 2018
Variational Wasserstein ClusteringLiang Mi, Wen Zhang, Xianfeng Gu et al.
We propose a new clustering method based on optimal transportation. We solve optimal transportation with variational principles, and investigate the use of power diagrams as transportation plans for aggregating arbitrary domains into a fixed number of clusters. We iteratively drive centroids through target domains while maintaining the minimum clustering energy by adjusting the power diagrams. Thus, we simultaneously pursue clustering and the Wasserstein distances between the centroids and the target domains, resulting in a measure-preserving mapping. We demonstrate the use of our method in domain adaptation, remeshing, and representation learning on synthetic and real data.
CVJan 17, 2018
Brenier approach for optimal transportation between a quasi-discrete measure and a discrete measureYing Lu, Liming Chen, Alexandre Saidi et al.
Correctly estimating the discrepancy between two data distributions has always been an important task in Machine Learning. Recently, Cuturi proposed the Sinkhorn distance which makes use of an approximate Optimal Transport cost between two distributions as a distance to describe distribution discrepancy. Although it has been successfully adopted in various machine learning applications (e.g. in Natural Language Processing and Computer Vision) since then, the Sinkhorn distance also suffers from two unnegligible limitations. The first one is that the Sinkhorn distance only gives an approximation of the real Wasserstein distance, the second one is the `divide by zero' problem which often occurs during matrix scaling when setting the entropy regularization coefficient to a small value. In this paper, we introduce a new Brenier approach for calculating a more accurate Wasserstein distance between two discrete distributions, this approach successfully avoids the two limitations shown above for Sinkhorn distance and gives an alternative way for estimating distribution discrepancy.
MMOct 18, 2012
Beltrami Representation and its applications to texture map and video compressionLok Ming Lui, Ka Chun Lam, Tsz Wai Wong et al.
Surface parameterizations and registrations are important in computer graphics and imaging, where 1-1 correspondences between meshes are computed. In practice, surface maps are usually represented and stored as 3D coordinates each vertex is mapped to, which often requires lots of storage memory. This causes inconvenience in data transmission and data storage. To tackle this problem, we propose an effective algorithm for compressing surface homeomorphisms using Fourier approximation of the Beltrami representation. The Beltrami representation is a complex-valued function defined on triangular faces of the surface mesh with supreme norm strictly less than 1. Under suitable normalization, there is a 1-1 correspondence between the set of surface homeomorphisms and the set of Beltrami representations. Hence, every bijective surface map is associated with a unique Beltrami representation. Conversely, given a Beltrami representation, the corresponding bijective surface map can be exactly reconstructed using the Linear Beltrami Solver introduced in this paper. Using the Beltrami representation, the surface homeomorphism can be easily compressed by Fourier approximation, without distorting the bijectivity of the map. The storage memory can be effectively reduced, which is useful for many practical problems in computer graphics and imaging. In this paper, we proposed to apply the algorithm to texture map compression and video compression. With our proposed algorithm, the storage requirement for the texture properties of a textured surface can be significantly reduced. Our algorithm can further be applied to compressing motion vector fields for video compression, which effectively improve the compression ratio.