Jian Lu

CV
h-index102
28papers
801citations
Novelty51%
AI Score55

28 Papers

CVMar 1, 2023Code
OSRE: Object-to-Spot Rotation Estimation for Bike Parking Assessment

Saghir Alfasly, Zaid Al-huda, Saifullah Bello et al.

Current deep models provide remarkable object detection in terms of object classification and localization. However, estimating object rotation with respect to other visual objects in the visual context of an input image still lacks deep studies due to the unavailability of object datasets with rotation annotations. This paper tackles these two challenges to solve the rotation estimation of a parked bike with respect to its parking area. First, we leverage the power of 3D graphics to build a camera-agnostic well-annotated Synthetic Bike Rotation Dataset (SynthBRSet). Then, we propose an object-to-spot rotation estimator (OSRE) by extending the object detection task to further regress the bike rotations in two axes. Since our model is purely trained on synthetic data, we adopt image smoothing techniques when deploying it on real-world images. The proposed OSRE is evaluated on synthetic and real-world data providing promising results. Our data and code are available at \href{https://github.com/saghiralfasly/OSRE-Project}{https://github.com/saghiralfasly/OSRE-Project}.

CVMar 6, 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos

Saghir Alfasly, Jian Lu, Chen Xu et al.

With the assumption that a video dataset is multimodality annotated in which auditory and visual modalities both are labeled or class-relevant, current multimodal methods apply modality fusion or cross-modality attention. However, effectively leveraging the audio modality in vision-specific annotated videos for action recognition is of particular challenge. To tackle this challenge, we propose a novel audio-visual framework that effectively leverages the audio modality in any solely vision-specific annotated dataset. We adopt the language models (e.g., BERT) to build a semantic audio-video label dictionary (SAVLD) that maps each video label to its most K-relevant audio labels in which SAVLD serves as a bridge between audio and video datasets. Then, SAVLD along with a pretrained audio multi-label model are used to estimate the audio-visual modality relevance during the training phase. Accordingly, a novel learnable irrelevant modality dropout (IMD) is proposed to completely drop out the irrelevant audio modality and fuse only the relevant modalities. Moreover, we present a new two-stream video Transformer for efficiently modeling the visual modalities. Results on several vision-specific annotated datasets including Kinetics400 and UCF-101 validated our framework as it outperforms most relevant action recognition methods.

AIAug 9, 2024
Revisiting Multi-Modal LLM Evaluation

Jian Lu, Shikhar Srivastava, Junyu Chen et al.

With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analysis. In this paper, we pioneer evaluating recent MLLMs (LLaVA 1.5, LLaVA-NeXT, BLIP2, InstructBLIP, GPT-4V, and GPT-4o) on datasets designed to address weaknesses in earlier ones. We assess three VQA datasets: 1) TDIUC, which permits fine-grained analysis on 12 question types; 2) TallyQA, which has simple and complex counting questions; and 3) DVQA, which requires optical character recognition for chart understanding. We also study VQDv1, a dataset that requires identifying all image regions that satisfy a given query. Our experiments reveal the weaknesses of many MLLMs that have not previously been reported. Our code is integrated into the widely used LAVIS framework for MLLM evaluation, enabling the rapid assessment of future MLLMs. Project webpage: https://kevinlujian.github.io/MLLM_Evaluations/

CVDec 26, 2025
MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation

Run Ling, Ke Cao, Jian Lu et al.

Multi-subject video generation aims to synthesize videos from textual prompts and multiple reference images, ensuring that each subject preserves natural scale and visual fidelity. However, current methods face two challenges: scale inconsistency, where variations in subject size lead to unnatural generation, and permutation sensitivity, where the order of reference inputs causes subject distortion. In this paper, we propose MoFu, a unified framework that tackles both challenges. For scale inconsistency, we introduce Scale-Aware Modulation (SMO), an LLM-guided module that extracts implicit scale cues from the prompt and modulates features to ensure consistent subject sizes. To address permutation sensitivity, we present a simple yet effective Fourier Fusion strategy that processes the frequency information of reference features via the Fast Fourier Transform to produce a unified representation. Besides, we design a Scale-Permutation Stability Loss to jointly encourage scale-consistent and permutation-invariant generation. To further evaluate these challenges, we establish a dedicated benchmark with controlled variations in subject scale and reference permutation. Extensive experiments demonstrate that MoFu significantly outperforms existing methods in preserving natural scale, subject fidelity, and overall visual quality.

AIApr 29, 2024
Capabilities of Gemini Models in Medicine

Khaled Saab, Tao Tu, Wei-Hung Weng et al.

Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.

OCJul 4, 2024
Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping

Xiaoxia Liu, Shijie Yu, Jian Lu et al.

Hyperspectral images~(HSIs) are often contaminated by a mixture of noise such as Gaussian noise, dead lines, stripes, and so on. In this paper, we propose a multi-scale low-rank tensor regularized $\ell_{2,p}$ (MLTL2p) approach for HSI denoising and destriping, which consists of an orthogonal constrained minimization model and an iterative algorithm with convergence guarantees. The model of the proposed MLTL2p approach is built based on a new sparsity-enhanced Multi-scale Low-rank Tensor regularization and a tensor $\ell_{2,p}$ norm with \(p\in (0,1)\). The multi-scale low-rank regularization for HSI denoising utilizes the global and local spectral correlation as well as the spatial nonlocal self-similarity priors of HSIs. The corresponding low-rank constraints are formulated based on independent higher-order singular value decomposition with sparsity enhancement on its core tensor to prompt more low-rankness. The tensor $\ell_{2,p}$ norm for HSI destriping is extended from the matrix $\ell_{2,p}$ norm. A proximal block coordinate descent algorithm is proposed in the MLTL2p approach to solve the resulting nonconvex nonsmooth minimization with orthogonal constraints. We show any accumulation point of the sequence generated by the proposed algorithm converges to a first-order stationary point, which is defined using three equalities of substationarity, symmetry, and feasibility for orthogonal constraints. In the numerical experiments, we compare the proposed method with state-of-the-art methods including a deep learning based method, and test the methods on both simulated and real HSI datasets. Our proposed MLTL2p method demonstrates outperformance in terms of metrics such as mean peak signal-to-noise ratio as well as visual quality.

CVJan 5
Point-SRA: Self-Representation Alignment for 3D Representation Learning

Lintong Wei, Jian Lu, Haozhe Cheng et al.

Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ratio neglect multi-level representational correlations and intrinsic geometric structures, while relying on point-wise reconstruction assumptions that conflict with the diversity of point cloud. To address these issues, we propose a 3D representation learning method, termed Point-SRA, which aligns representations through self-distillation and probabilistic modeling. Specifically, we assign different masking ratios to the MAE to capture complementary geometric and semantic information, while the MeanFlow Transformer (MFT) leverages cross-modal conditional embeddings to enable diverse probabilistic reconstruction. Our analysis further reveals that representations at different time steps in MFT also exhibit complementarity. Therefore, a Dual Self-Representation Alignment mechanism is proposed at both the MAE and MFT levels. Finally, we design a Flow-Conditioned Fine-Tuning Architecture to fully exploit the point cloud distribution learned via MeanFlow. Point-SRA outperforms Point-MAE by 5.37% on ScanObjectNN. On intracranial aneurysm segmentation, it reaches 96.07% mean IoU for arteries and 86.87% for aneurysms. For 3D object detection, Point-SRA achieves 47.3% AP@50, surpassing MaskPoint by 5.12%.

CVJul 22, 2025Code
Dyna3DGR: 4D Cardiac Motion Tracking with Dynamic 3D Gaussian Representation

Xueming Fu, Pei Wu, Yingtai Li et al.

Accurate analysis of cardiac motion is crucial for evaluating cardiac function. While dynamic cardiac magnetic resonance imaging (CMR) can capture detailed tissue motion throughout the cardiac cycle, the fine-grained 4D cardiac motion tracking remains challenging due to the homogeneous nature of myocardial tissue and the lack of distinctive features. Existing approaches can be broadly categorized into image based and representation-based, each with its limitations. Image-based methods, including both raditional and deep learning-based registration approaches, either struggle with topological consistency or rely heavily on extensive training data. Representation-based methods, while promising, often suffer from loss of image-level details. To address these limitations, we propose Dynamic 3D Gaussian Representation (Dyna3DGR), a novel framework that combines explicit 3D Gaussian representation with implicit neural motion field modeling. Our method simultaneously optimizes cardiac structure and motion in a self-supervised manner, eliminating the need for extensive training data or point-to-point correspondences. Through differentiable volumetric rendering, Dyna3DGR efficiently bridges continuous motion representation with image-space alignment while preserving both topological and temporal consistency. Comprehensive evaluations on the ACDC dataset demonstrate that our approach surpasses state-of-the-art deep learning-based diffeomorphic registration methods in tracking accuracy. The code will be available in https://github.com/windrise/Dyna3DGR.

CVMay 30, 2021Code
EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network

Hu Zhang, Keke Zu, Jian Lu et al.

Recently, it has been demonstrated that the performance of a deep convolutional neural network can be effectively improved by embedding an attention module into it. In this work, a novel lightweight and effective attention method named Pyramid Squeeze Attention (PSA) module is proposed. By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Squeeze Attention (EPSA) is obtained. The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved. Hence, a simple and efficient backbone architecture named EPSANet is developed in this work by stacking these ResNet-style EPSA blocks. Correspondingly, a stronger multi-scale representation ability can be offered by the proposed EPSANet for various computer vision tasks including but not limited to, image classification, object detection, instance segmentation, etc. Without bells and whistles, the performance of the proposed EPSANet outperforms most of the state-of-the-art channel attention methods. As compared to the SENet-50, the Top-1 accuracy is improved by 1.93% on ImageNet dataset, a larger margin of +2.7 box AP for object detection and an improvement of +1.7 mask AP for instance segmentation by using the Mask-RCNN on MS-COCO dataset are obtained. Our source code is available at:https://github.com/murufeng/EPSANet.

CVJul 7, 2024
EMBANet: A Flexible Efffcient Multi-branch Attention Network

Keke Zu, Hu Zhang, Jian Lu et al.

This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are considered in this work, both of which can represent multi-scale features at a more granular level and increase the range of receptive fields. By integrating the MBC and attention module, a multi-branch attention (MBA) module is consequently developed to capture the channel-wise interaction of feature maps for establishing the long-range channel dependency. By substituting the 3x3 convolutions in the bottleneck blocks of the ResNet with the proposed MBA, a novel block namely efficient multi-branch attention (EMBA) is obtained, which can be easily plugged into the state-of-the-art backbone CNN models. Furthermore, a new backbone network called EMBANet is established by stacking the EMBA blocks. The proposed EMBANet is extensively evaluated on representative computer vision tasks including: classification, detection, and segmentation. And it demonstrates consistently superior performance over the popular backbones.

DBNov 2, 2025
Reliable Curation of EHR Dataset via Large Language Models under Environmental Constraints

Raymond M. Xiong, Panyu Chen, Tianze Dong et al.

Electronic health records (EHRs) are central to modern healthcare delivery and research; yet, many researchers lack the database expertise necessary to write complex SQL queries or generate effective visualizations, limiting efficient data use and scientific discovery. To address this barrier, we introduce CELEC, a large language model (LLM)-powered framework for automated EHR data extraction and analytics. CELEC translates natural language queries into SQL using a prompting strategy that integrates schema information, few-shot demonstrations, and chain-of-thought reasoning, which together improve accuracy and robustness. On a subset of the EHRSQL benchmark, CELEC achieves execution accuracy comparable to prior systems while maintaining low latency, cost efficiency, and strict privacy by exposing only database metadata to the LLM. CELEC also adheres to strict privacy protocols: the LLM accesses only database metadata (e.g., table and column names), while all query execution occurs securely within the institutional environment, ensuring that no patient-level data is ever transmitted to or shared with the LLM. Ablation studies confirm that each component of the SQL generation pipeline, particularly the few-shot demonstrations, plays a critical role in performance. By lowering technical barriers and enabling medical researchers to query EHR databases directly, CELEC streamlines research workflows and accelerates biomedical discovery.

CVDec 20, 2024
CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Xiuli Bi, Jian Lu, Bo Liu et al.

Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of the given video easily. In detail, we first analyze the prompt influence in the current video diffusion model and find the LoRAs are only needed for the specific layers for appearance and motion customization. Besides, since each LoRA is trained individually, we propose a novel test-time training technique to update parameters after combination utilizing the trained customized models. We conduct detailed experiments to verify the effectiveness of the proposed methods. Our method outperforms several state-of-the-art works in both qualitative and quantitative evaluations.

AIOct 28, 2024
Neuro-symbolic Learning Yielding Logical Constraints

Zenan Li, Yunpeng Huang, Zhaoyu Li et al. · utoronto

Neuro-symbolic systems combine the abilities of neural perception and logical reasoning. However, end-to-end learning of neuro-symbolic systems is still an unsolved challenge. This paper proposes a natural framework that fuses neural network training, symbol grounding, and logical constraint synthesis into a coherent and efficient end-to-end learning process. The capability of this framework comes from the improved interactions between the neural and the symbolic parts of the system in both the training and inference stages. Technically, to bridge the gap between the continuous neural network and the discrete logical constraint, we introduce a difference-of-convex programming technique to relax the logical constraints while maintaining their precision. We also employ cardinality constraints as the language for logical constraint learning and incorporate a trust region method to avoid the degeneracy of logical constraint in learning. Both theoretical analyses and empirical evaluations substantiate the effectiveness of the proposed framework.

IRMay 3, 2025
RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation

Run Ling, Wenji Wang, Yuting Liu et al.

Personalized image generation is crucial for improving the user experience, as it renders reference images into preferred ones according to user visual preferences. Although effective, existing methods face two main issues. First, existing methods treat all items in the user historical sequence equally when extracting user preferences, overlooking the varying semantic similarities between historical items and the reference item. Disproportionately high weights for low-similarity items distort users' visual preferences for the reference item. Second, existing methods heavily rely on consistency between generated and reference images to optimize the generation, which leads to underfitting user preferences and hinders personalization. To address these issues, we propose Retrieval Augment Personalized Image GenerAtion guided by Recommendation (RAGAR). Our approach uses a retrieval mechanism to assign different weights to historical items according to their similarities to the reference item, thereby extracting more refined users' visual preferences for the reference item. Then we introduce a novel rank task based on the multi-modal ranking model to optimize the personalization of the generated images instead of forcing depend on consistency. Extensive experiments and human evaluations on three real-world datasets demonstrate that RAGAR achieves significant improvements in both personalization and semantic metrics compared to five baselines.

AO-PHFeb 11
Hierarchical Testing of a Hybrid Machine Learning-Physics Global Atmosphere Model

Ziming Chen, L. Ruby Leung, Wenyu Zhou et al.

Machine learning (ML)-based models have demonstrated high skill and computational efficiency, often outperforming conventional physics-based models in weather and subseasonal predictions. While prior studies have assessed their fidelity in capturing synoptic-scale atmospheric dynamics, their performance across timescales and under out-of-distribution forcing, such as +3K or +4K uniform-warming forcings, and the sources of biases remain elusive, to establish the model reliability for Earth science. Here, we design three sets of experiments targeting synoptic-scale phenomena, interannual variability, and out-of-distribution uniform-warming forcings. We evaluate the Neural General Circulation Model (NeuralGCM), a hybrid model integrating a dynamical core with ML-based component, against observations and physics-based Earth system models (ESMs). At the synoptic scale, NeuralGCM captures the evolution and propagation of extratropical cyclones with performance comparable to ESMs. At the interannual scale, when forced by El Niño-Southern Oscillation sea surface temperature (SST) anomalies, NeuralGCM successfully reproduces associated teleconnection patterns but exhibits deficiencies in capturing nonlinear response. Under out-of-distribution uniform-warming forcings, NeuralGCM simulates similar responses in global-average temperature and precipitation and reproduces large-scale tropospheric circulation features similar to those in ESMs. Notable weaknesses include overestimating the tracks and spatial extent of extratropical cyclones, biases in the teleconnected wave train triggered by tropical SST anomalies, and differences in upper-level warming and stratospheric circulation responses to SST warming compared to physics-based ESMs. The causes of these weaknesses were explored.

LGNov 24, 2025
Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Jian Lu, Yi Luo

Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention, with growing efforts to reproduce and apply it. However, training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are typically deployed on the same devices. While this approach reduces costs through resource consolidation, its synchronous execution imposes a computational coupling that prevents concurrent inference and training. In this study, we are returning to the strategy of separating inference and training deployment, and by introducing improvements in the data loader, we transform the conventional synchronous architecture into a periodically asynchronous framework, which allows for demand-driven, independent, and elastic scaling of each component, while the accuracy of the algorithm remains completely equivalent to the synchronization method, with both belonging to the on-policy strategy. It is worth emphasizing that we apply a unified tri-model architecture in the training phase, and we also proposed a shared-prompt attention mask to reduce repetitive computation. In practice, our approach consistently delivers significant end-to-end training efficiency improvements on NPU platforms, indicating its potential for widespread application.

LGOct 21, 2025
Approximation Rates of Shallow Neural Networks: Barron Spaces, Activation Functions and Optimality Analysis

Jian Lu, Xiaohuang Huang

This paper investigates the approximation properties of shallow neural networks with activation functions that are powers of exponential functions. It focuses on the dependence of the approximation rate on the dimension and the smoothness of the function being approximated within the Barron function space. We examine the approximation rates of ReLU$^{k}$ activation functions, proving that the optimal rate cannot be achieved under $\ell^{1}$-bounded coefficients or insufficient smoothness conditions. We also establish optimal approximation rates in various norms for functions in Barron spaces and Sobolev spaces, confirming the curse of dimensionality. Our results clarify the limits of shallow neural networks' approximation capabilities and offer insights into the selection of activation functions and network structures.

IVJul 23, 2025
MyGO: Make your Goals Obvious, Avoiding Semantic Confusion in Prostate Cancer Lesion Region Segmentation

Zhengcheng Lin, Zuobin Ying, Zhenyu Li et al.

Early diagnosis and accurate identification of lesion location and progression in prostate cancer (PCa) are critical for assisting clinicians in formulating effective treatment strategies. However, due to the high semantic homogeneity between lesion and non-lesion areas, existing medical image segmentation methods often struggle to accurately comprehend lesion semantics, resulting in the problem of semantic confusion. To address this challenge, we propose a novel Pixel Anchor Module, which guides the model to discover a sparse set of feature anchors that serve to capture and interpret global contextual information. This mechanism enhances the model's nonlinear representation capacity and improves segmentation accuracy within lesion regions. Moreover, we design a self-attention-based Top_k selection strategy to further refine the identification of these feature anchors, and incorporate a focal loss function to mitigate class imbalance, thereby facilitating more precise semantic interpretation across diverse regions. Our method achieves state-of-the-art performance on the PI-CAI dataset, demonstrating 69.73% IoU and 74.32% Dice scores, and significantly improving prostate cancer lesion detection.

LGJul 6, 2025
Normalized Iterative Hard Thresholding for Tensor Recovery

Li Li, Yuneng Liang, Kaijie Zheng et al.

Low-rank recovery builds upon ideas from the theory of compressive sensing, which predicts that sparse signals can be accurately reconstructed from incomplete measurements. Iterative thresholding-type algorithms-particularly the normalized iterative hard thresholding (NIHT) method-have been widely used in compressed sensing (CS) and applied to matrix recovery tasks. In this paper, we propose a tensor extension of NIHT, referred to as TNIHT, for the recovery of low-rank tensors under two widely used tensor decomposition models. This extension enables the effective reconstruction of high-order low-rank tensors from a limited number of linear measurements by leveraging the inherent low-dimensional structure of multi-way data. Specifically, we consider both the CANDECOMP/PARAFAC (CP) rank and the Tucker rank to characterize tensor low-rankness within the TNIHT framework. At the same time, we establish a convergence theorem for the proposed TNIHT method under the tensor restricted isometry property (TRIP), providing theoretical support for its recovery guarantees. Finally, we evaluate the performance of TNIHT through numerical experiments on synthetic, image, and video data, and compare it with several state-of-the-art algorithms.

IVJan 7, 2022
A three-dimensional dual-domain deep network for high-pitch and sparse helical CT reconstruction

Wei Wang, Xiang-Gen Xia, Chuanjiang He et al.

In this paper, we propose a new GPU implementation of the Katsevich algorithm for helical CT reconstruction. Our implementation divides the sinograms and reconstructs the CT images pitch by pitch. By utilizing the periodic properties of the parameters of the Katsevich algorithm, our method only needs to calculate these parameters once for all the pitches and so has lower GPU-memory burdens and is very suitable for deep learning. By embedding our implementation into the network, we propose an end-to-end deep network for the high pitch helical CT reconstruction with sparse detectors. Since our network utilizes the features extracted from both sinograms and CT images, it can simultaneously reduce the streak artifacts caused by the sparsity of sinograms and preserve fine details in the CT images. Experiments show that our network outperforms the related methods both in subjective and objective evaluations.

LGMay 10, 2021
Robust Graph Learning Under Wasserstein Uncertainty

Xiang Zhang, Yinfei Xu, Qinghe Liu et al.

Graphs are playing a crucial role in different fields since they are powerful tools to unveil intrinsic relationships among signals. In many scenarios, an accurate graph structure representing signals is not available at all and that motivates people to learn a reliable graph structure directly from observed signals. However, in real life, it is inevitable that there exists uncertainty in the observed signals due to noise measurements or limited observability, which causes a reduction in reliability of the learned graph. To this end, we propose a graph learning framework using Wasserstein distributionally robust optimization (WDRO) which handles uncertainty in data by defining an uncertainty set on distributions of the observed data. Specifically, two models are developed, one of which assumes all distributions in uncertainty set are Gaussian distributions and the other one has no prior distributional assumption. Instead of using interior point method directly, we propose two algorithms to solve the corresponding models and show that our algorithms are more time-saving. In addition, we also reformulate both two models into Semi-Definite Programming (SDP), and illustrate that they are intractable in the scenario of large-scale graph. Experiments on both synthetic and real world data are carried out to validate the proposed framework, which show that our scheme can learn a reliable graph in the context of uncertainty.

IVJan 6, 2021
A New Weighting Scheme for Fan-beam and Circle Cone-beam CT Reconstructions

Wei Wang, Xiang-Gen Xia, Chuanjiang He et al.

In this paper, we first present an arc based algorithm for fan-beam computed tomography (CT) reconstruction via applying Katsevich's helical CT formula to 2D fan-beam CT reconstruction. Then, we propose a new weighting function to deal with the redundant projection data. By extending the weighted arc based fan-beam algorithm to circle cone-beam geometry, we also obtain a new FDK-similar algorithm for circle cone-beam CT reconstruction. Experiments show that our methods can obtain higher PSNR and SSIM compared to the Parker-weighted conventional fan-beam algorithm and the FDK algorithm for super-short-scan trajectories.

CVSep 28, 2020
Multi-scale Receptive Fields Graph Attention Network for Point Cloud Classification

Xi-An Li, Lei Zhang, Li-Yan Wang et al.

Understanding the implication of point cloud is still challenging to achieve the goal of classification or segmentation due to the irregular and sparse structure of point cloud. As we have known, PointNet architecture as a ground-breaking work for point cloud which can learn efficiently shape features directly on unordered 3D point cloud and have achieved favorable performance. However, this model fail to consider the fine-grained semantic information of local structure for point cloud. Afterwards, many valuable works are proposed to enhance the performance of PointNet by means of semantic features of local patch for point cloud. In this paper, a multi-scale receptive fields graph attention network (named after MRFGAT) for point cloud classification is proposed. By focusing on the local fine features of point cloud and applying multi attention modules based on channel affinity, the learned feature map for our network can well capture the abundant features information of point cloud. The proposed MRFGAT architecture is tested on ModelNet10 and ModelNet40 datasets, and results show it achieves state-of-the-art performance in shape classification tasks.

IVAug 10, 2020
A model-guided deep network for limited-angle computed tomography

Wei Wang, Xiang-Gen Xia, Chuanjiang He et al.

In this paper, we first propose a variational model for the limited-angle computed tomography (CT) image reconstruction and then convert the model into an end-to-end deep network.We use the penalty method to solve the model and divide it into three iterative subproblems, where the first subproblem completes the sinograms by utilizing the prior information of sinograms in the frequency domain and the second refines the CT images by using the prior information of CT images in the spatial domain, and the last merges the outputs of the first two subproblems. In each iteration, we use the convolutional neural networks (CNNs) to approxiamte the solutions of the first two subproblems and, thus, obtain an end-to-end deep network for the limited-angle CT image reconstruction. Our network tackles both the sinograms and the CT images, and can simultaneously suppress the artifacts caused by the incomplete data and recover fine structural information in the CT images. Experimental results show that our method outperforms the existing algorithms for the limited-angle CT image reconstruction.

IVJan 20, 2020
A deep network for sinogram and CT image reconstruction

Wei Wang, Xiang-Gen Xia, Chuanjiang He et al.

A CT image can be well reconstructed when the sampling rate of the sinogram satisfies the Nyquist criteria and the sampled signal is noise-free. However, in practice, the sinogram is usually contaminated by noise, which degrades the quality of a reconstructed CT image. In this paper, we design a deep network for sinogram and CT image reconstruction. The network consists of two cascaded blocks that are linked by a filter backprojection (FBP) layer, where the former block is responsible for denoising and completing the sinograms while the latter is used to removing the noise and artifacts of the CT images. Experimental results show that the reconstructed CT images by our methods have the highest PSNR and SSIM in average compared to state of the art methods.

SEJul 6, 2018
CoMID: Context-based Multi-Invariant Detection for Monitoring Cyber-Physical Software

Yi Qin, Tao Xie, Chang Xu et al.

Cyber-physical software continually interacts with its physical environment for adaptation in order to deliver smart services. However, the interactions can be subject to various errors when the software's assumption on its environment no longer holds, thus leading to unexpected misbehavior or even failure. To address this problem, one promising way is to conduct runtime monitoring of invariants, so as to prevent cyber-physical software from entering such errors (a.k.a. abnormal states). To effectively detect abnormal states, we in this article present an approach, named Context-based Multi-Invariant Detection (CoMID), which consists of two techniques: context-based trace grouping and multi-invariant detection. The former infers contexts to distinguish different effective scopes for CoMID's derived invariants, and the latter conducts ensemble evaluation of multiple invariants to detect abnormal states. We experimentally evaluate CoMID on real-world cyber-physical software. The results show that CoMID achieves a 5.7-28.2% higher true-positive rate and a 6.8-37.6% lower false-positive rate in detecting abnormal states, as compared with state-of-the-art approaches (i.e., Daikon and ZoomIn). When deployed in field tests, CoMID's runtime monitoring improves the success rate of cyber-physical software in its task executions by 15.3-31.7%.

DBNov 27, 2013
Want a Good Answer? Ask a Good Question First!

Yuan Yao, Hanghang Tong, Tao Xie et al.

Community Question Answering (CQA) websites have become valuable repositories which host a massive volume of human knowledge. To maximize the utility of such knowledge, it is essential to evaluate the quality of an existing question or answer, especially soon after it is posted on the CQA website. In this paper, we study the problem of inferring the quality of questions and answers through a case study of a software CQA (Stack Overflow). Our key finding is that the quality of an answer is strongly positively correlated with that of its question. Armed with this observation, we propose a family of algorithms to jointly predict the quality of questions and answers, for both quantifying numerical quality scores and differentiating the high-quality questions/answers from those of low quality. We conduct extensive experimental evaluations to demonstrate the effectiveness and efficiency of our methods.

DCOct 14, 2013
Enabling Context-awareness by Predicate Detection in Asynchronous Pervasive Computing Environments

Yiling Yang, Yu Huang, Xiaoxing Ma et al.

Pervasive applications are involving more and more autonomous computing and communicating devices, augmented with the abilities of sensing and controlling the logical / physical environment. To enable context-awareness for such applications, we are challenged by the intrinsic asynchrony among the context collecting devices. To this end, we introduce the predicate detection theory and propose the Predicate-Detection-based Context-Awareness (PD-CA) framework, in which: a) logical time is used to explicitly cope with the asynchrony; b) specification of predicates enables the applications to express contextual properties of their concerns; c) online and incremental predicate detection algorithms effectively enable context-awareness at runtime. Under the guidance of the PD-CA framework, we present the design and implementation of the MIPA middleware, which shields the applications from the burden of processing the asynchronous contexts. We also demonstrate how PD-CA simplifies the development of context-aware applications. Experimental evaluations show the performance of MIPA in supporting context-aware applications despite of the asynchrony.