NAMay 29
Lightning Plus Polynomial Approximation: Optimal Root-Exponential Convergence for Singular Functions in Corner DomainsShuhuang Xiang, Jun Xiang, Shunfeng Yang et al.
This work presents a rigorous convergence analysis for the lightning plus polynomial approximation scheme, which employs rational approximations constructed with tapered, exponentially clustered poles. This pole placement strategy was originally introduced by Trefethen and his collaborators for the resolution of corner singularities. We establish optimal root-exponential convergence for the class of prototype functions of the form $g(z)z^α$ or $g(z)z^α\log z$, where $g$ is analytic on a neighborhood of the solution domain. The results obtained in this work confirm the validity of Conjectures 3.1 and 5.3 stated in [SIAM J. Numer. Anal., 61:2580-2600, 2023], and demonstrate that the choice $σ_{\mathrm{opt}} =\frac{\sqrt{2(2 -β)}π}{\sqrtα}$ achieves the theoretically optimal convergence rate $\mathcal{O}\left(e^{-\sqrt{2(2 - β)Nα}π}\right)$. In particular, for the specific case $β= 0$, the proposed scheme achieves the same optimal convergence rate as the best rational approximation to $x^α$ on $[0,1]$ established by Stahl. Furthermore, working within the decomposition framework for corner domains proposed by Gopal and Trefethen, this paper provides a rigorous proof of optimal root-exponential convergence for lightning plus polynomial approximation problems, and explicitly derives the optimal pole clustering parameter.
GROct 12, 2022
Reconstructing Personalized Semantic Facial NeRF Models From Monocular VideoXuan Gao, Chenglai Zhong, Jun Xiang et al.
We present a novel semantic model for human head defined with neural radiance field. The 3D-consistent head model consist of a set of disentangled and interpretable bases, and can be driven by low-dimensional expression coefficients. Thanks to the powerful representation ability of neural radiance field, the constructed model can represent complex facial attributes including hair, wearings, which can not be represented by traditional mesh blendshape. To construct the personalized semantic facial model, we propose to define the bases as several multi-level voxel fields. With a short monocular RGB video as input, our method can construct the subject's semantic facial NeRF model with only ten to twenty minutes, and can render a photo-realistic human head image in tens of miliseconds with a given expression coefficient and view direction. With this novel representation, we apply it to many tasks like facial retargeting and expression editing. Experimental results demonstrate its strong representation ability and training/inference speed. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/NeRFBlendShape/
ROSep 25, 2024
Data-driven Probabilistic Trajectory Learning with High Temporal Resolution in Terminal AirspaceJun Xiang, Jun Chen
Predicting flight trajectories is a research area that holds significant merit. In this paper, we propose a data-driven learning framework, that leverages the predictive and feature extraction capabilities of the mixture models and seq2seq-based neural networks while addressing prevalent challenges caused by error propagation and dimensionality reduction. After training with this framework, the learned model can improve long-step prediction accuracy significantly given the past trajectories and the context information. The accuracy and effectiveness of the approach are evaluated by comparing the predicted trajectories with the ground truth. The results indicate that the proposed method has outperformed the state-of-the-art predicting methods on a terminal airspace flight trajectory dataset. The trajectories generated by the proposed method have a higher temporal resolution(1 timestep per second vs 0.1 timestep per second) and are closer to the ground truth.
CVJul 25, 2023
Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perceptionChuanyu Luo, Nuo Cheng, Sikun Ma et al.
Common deep learning models for 3D environment perception often use pillarization/voxelization methods to convert point cloud data into pillars/voxels and then process it with a 2D/3D convolutional neural network (CNN). The pioneer work PointNet has been widely applied as a local feature descriptor, a fundamental component in deep learning models for 3D perception, to extract features of a point cloud. This is achieved by using a symmetric max-pooling operator which provides unique pillar/voxel features. However, by ignoring most of the points, the max-pooling operator causes an information loss, which reduces the model performance. To address this issue, we propose a novel local feature descriptor, mini-PointNetPlus, as an alternative for plug-and-play to PointNet. Our basic idea is to separately project the data points to the individual features considered, each leading to a permutation invariant. Thus, the proposed descriptor transforms an unordered point cloud to a stable order. The vanilla PointNet is proved to be a special case of our mini-PointNetPlus. Due to fully utilizing the features by the proposed descriptor, we demonstrate in experiment a considerable performance improvement for 3D perception.
CVDec 2, 2024
One Shot, One Talk: Whole-body Talking Avatar from a Single ImageJun Xiang, Yudong Guo, Leipeng Hu et al.
Building realistic and animatable avatars still requires minutes of multi-view or monocular self-rotating videos, and most methods lack precise control over gestures and expressions. To push this boundary, we address the challenge of constructing a whole-body talking avatar from a single image. We propose a novel pipeline that tackles two critical issues: 1) complex dynamic modeling and 2) generalization to novel gestures and expressions. To achieve seamless generalization, we leverage recent pose-guided image-to-video diffusion models to generate imperfect video frames as pseudo-labels. To overcome the dynamic modeling challenge posed by inconsistent and noisy pseudo-videos, we introduce a tightly coupled 3DGS-mesh hybrid avatar representation and apply several key regularizations to mitigate inconsistencies caused by imperfect labels. Extensive experiments on diverse subjects demonstrate that our method enables the creation of a photorealistic, precisely animatable, and expressive whole-body talking avatar from just a single image.
IVDec 6, 2024
Reconstructing Quantitative Cerebral Perfusion Images Directly From Measured Sinogram Data Acquired Using C-arm Cone-Beam CTHaotian Zhao, Ruifeng Chen, Jing Yan et al.
To shorten the door-to-puncture time for better treating patients with acute ischemic stroke, it is highly desired to obtain quantitative cerebral perfusion images using C-arm cone-beam computed tomography (CBCT) equipped in the interventional suite. However, limited by the slow gantry rotation speed, the temporal resolution and temporal sampling density of typical C-arm CBCT are much poorer than those of multi-detector-row CT in the diagnostic imaging suite. The current quantitative perfusion imaging includes two cascaded steps: time-resolved image reconstruction and perfusion parametric estimation. For time-resolved image reconstruction, the technical challenge imposed by poor temporal resolution and poor sampling density causes inaccurate quantification of the temporal variation of cerebral artery and tissue attenuation values. For perfusion parametric estimation, it remains a technical challenge to appropriately design the handcrafted regularization for better solving the associated deconvolution problem. These two challenges together prevent obtaining quantitatively accurate perfusion images using C-arm CBCT. The purpose of this work is to simultaneously address these two challenges by combining the two cascaded steps into a single joint optimization problem and reconstructing quantitative perfusion images directly from the measured sinogram data. In the developed direct cerebral perfusion parametric image reconstruction technique, TRAINER in short, the quantitative perfusion images have been represented as a subject-specific conditional generative model trained under the constraint of the time-resolved CT forward model, perfusion convolutional model, and the subject's own measured sinogram data. Results shown in this paper demonstrated that using TRAINER, quantitative cerebral perfusion images can be accurately obtained using C-arm CBCT in the interventional suite.
RONov 21, 2024
Landing Trajectory Prediction for UAS Based on Generative Adversarial NetworkJun Xiang, Drake Essick, Luiz Gonzalez Bautista et al.
Models for trajectory prediction are an essential component of many advanced air mobility studies. These models help aircraft detect conflict and plan avoidance maneuvers, which is especially important in Unmanned Aircraft systems (UAS) landing management due to the congested airspace near vertiports. In this paper, we propose a landing trajectory prediction model for UAS based on Generative Adversarial Network (GAN). The GAN is a prestigious neural network that has been developed for many years. In previous research, GAN has achieved many state-of-the-art results in many generation tasks. The GAN consists of one neural network generator and a neural network discriminator. Because of the learning capacity of the neural networks, the generator is capable to understand the features of the sample trajectory. The generator takes the previous trajectory as input and outputs some random status of a flight. According to the results of the experiences, the proposed model can output more accurate predictions than the baseline method(GMR) in various datasets. To evaluate the proposed model, we also create a real UAV landing dataset that includes more than 2600 trajectories of drone control manually by real pilots.
CVDec 3, 2023
FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian EmbeddingJun Xiang, Xuan Gao, Yudong Guo et al.
We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/
CVJan 17, 2022
A Novel Framework to Jointly Compress and Index Remote Sensing Images for Efficient Content-Based RetrievalGencer Sumbul, Jun Xiang, Nimisha Thekke Madam et al.
Remote sensing (RS) images are usually stored in compressed format to reduce the storage size of the archives. Thus, existing content-based image retrieval (CBIR) systems in RS require decoding images before applying CBIR (which is computationally demanding in the case of large-scale CBIR problems). To address this problem, in this paper, we present a joint framework that simultaneously learns RS image compression and indexing. Thus, it eliminates the need for decoding RS images before applying CBIR. The proposed framework is made up of two modules. The first module compresses RS images based on an auto-encoder architecture. The second module produces hash codes with a high discrimination capability by employing soft pairwise, bit-balancing and classification loss functions. We also introduce a two stage learning strategy with gradient manipulation techniques to obtain image representations that are compatible with both RS image indexing and compression. Experimental results show the efficacy of the proposed framework when compared to widely used approaches in RS. The code of the proposed framework is available at https://git.tu-berlin.de/rsim/RS-JCIF.
CVJul 29, 2019
End-to-End Learning Deep CRF models for Multi-Object TrackingJun Xiang, Ma Chao, Guohan Xu et al.
Existing deep multi-object tracking (MOT) approaches first learn a deep representation to describe target objects and then associate detection results by optimizing a linear assignment problem. Despite demonstrated successes, it is challenging to discriminate target objects under mutual occlusion or to reduce identity switches in crowded scenes. In this paper, we propose learning deep conditional random field (CRF) networks, aiming to model the assignment costs as unary potentials and the long-term dependencies among detection results as pairwise potentials. Specifically, we use a bidirectional long short-term memory (LSTM) network to encode the long-term dependencies. We pose the CRF inference as a recurrent neural network learning process using the standard gradient descent algorithm, where unary and pairwise potentials are jointly optimized in an end-to-end manner. Extensive experimental results on the challenging MOT datasets including MOT-2015 and MOT-2016, demonstrate that our approach achieves the state of the art performances in comparison with published works on both benchmarks.
CVFeb 9, 2018
Multiple Target Tracking by Learning Feature Representation and Distance Metric JointlyJun Xiang, Guoshuai Zhang, Jianhua Hou et al.
Designing a robust affinity model is the key issue in multiple target tracking (MTT). This paper proposes a novel affinity model by learning feature representation and distance metric jointly in a unified deep architecture. Specifically, we design a CNN network to obtain appearance cue tailored towards person Re-ID, and an LSTM network for motion cue to predict target position, respectively. Both cues are combined with a triplet loss function, which performs end-to-end learning of the fused features in a desired embedding space. Experiments in the challenging MOT benchmark demonstrate, that even by a simple Linear Assignment strategy fed with affinity scores of our method, very competitive results are achieved when compared with the most recent state-of-theart approaches.