CLApr 12, 2022
Stylized Knowledge-Grounded Dialogue Generation via Disentangled Template RewritingQingfeng Sun, Can Xu, Huang Hu et al. · microsoft-research
Current Knowledge-Grounded Dialogue Generation (KDG) models specialize in producing rational and factual responses. However, to establish long-term relationships with users, the KDG model needs the capability to generate responses in a desired style or attribute. Thus, we study a new problem: Stylized Knowledge-Grounded Dialogue Generation (SKDG). It presents two challenges: (1) How to train a SKDG model where no <context, knowledge, stylized response> triples are available. (2) How to cohere with context and preserve the knowledge when generating a stylized response. In this paper, we propose a novel disentangled template rewriting (DTR) method which generates responses via combing disentangled style templates (from monolingual stylized corpus) and content templates (from KDG corpus). The entire framework is end-to-end differentiable and learned without supervision. Extensive experiments on two benchmarks indicate that DTR achieves a significant improvement on all evaluation metrics compared with previous state-of-the-art stylized dialogue generation methods. Besides, DTR achieves comparable performance with the state-of-the-art KDG methods in standard KDG evaluation setting.
NAFeb 16, 2015
A Full Multigrid Method for Nonlinear Eigenvalue ProblemsShanghui Jia, Hehu Xie, Manting Xie et al.
This paper is to introduce a type of full multigrid method for the nonlinear eigenvalue problem. The main idea is to transform the solution of nonlinear eigenvalue problem into a series of solutions of the corresponding linear boundary value problems on the sequence of finite element spaces and nonlinear eigenvalue problems on the coarsest finite element space. The linearized boundary value problems are solved by some multigrid iterations. Besides the multigrid iteration, all other efficient iteration methods for solving boundary value problems can serve as the linear problem solver. We will prove that the computational work of this new scheme is truly optimal, the same as solving the linear corresponding boundary value problem. In this case, this type of iteration scheme certainly improves the overfull efficiency of solving nonlinear eigenvalue problems. Some numerical experiments are presented to validate the efficiency of the new method.
CVMar 9, 2023
CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image SegmentationShoukun Sun, Min Xian, Fei Xu et al.
The click-based interactive segmentation aims to extract the object of interest from an image with the guidance of user clicks. Recent work has achieved great overall performance by employing feedback from the output. However, in most state-of-the-art approaches, 1) the inference stage involves inflexible heuristic rules and requires a separate refinement model, and 2) the number of user clicks and model performance cannot be balanced. To address the challenges, we propose a click-based and mask-guided interactive image segmentation framework containing three novel components: Cascade-Forward Refinement (CFR), Iterative Click Loss (ICL), and SUEM image augmentation. The CFR offers a unified inference framework to generate segmentation results in a coarse-to-fine manner. The proposed ICL allows model training to improve segmentation and reduce user interactions simultaneously. The proposed SUEM augmentation is a comprehensive way to create large and diverse training sets for interactive image segmentation. Extensive experiments demonstrate the state-of-the-art performance of the proposed approach on five public datasets. Remarkably, our model reduces by 33.2\%, and 15.5\% the number of clicks required to surpass an IoU of 0.95 in the previous state-of-the-art approach on the Berkeley and DAVIS sets, respectively.
NADec 11, 2017
An Efficient Multigrid Method for Ground State Solution of Bose-Einstein CondensatesHehu Xie, Fei Xu, Ning Zhang
An efficient multigrid method is proposed to compute the ground state solution of Bose-Einstein condensations by the finite element method based on the combination of the multigrid method for nonlinear eigenvalue problem and an efficient implementation for the nonlinear iteration. The proposed numerical method not only has the optimal convergence rate, but also has the asymptotically optimal computational work which is independent from the nonlinearity of the problem. The independence from the nonlinearity means that the asymptotic estimate of the computational work can reach almost the same as that of solving the corresponding linear boundary value problem by the multigrid method. Some numerical experiments are provided to validate the efficiency of the proposed method.
NAMar 28, 2017
A Full Multigrid Method For Semilinear Elliptic EquationHehu Xie, Fei Xu
A full multigrid finite element method is proposed for semilinear elliptic equations. The main idea is to transform the solution of the semilinear problem into a series of solutions of the corresponding linear boundary value problems on the sequence of finite element spaces and semilinear problems on a very low dimensional space. The linearized boundary value problems are solved by some multigrid iterations. Besides the multigrid iteration, all other efficient numerical methods can also serve as the linear solver for solving boundary value problems. The optimality of the computational work is also proved. Compared with the existing multigrid methods which need the bounded second order derivatives of the nonlinear term, the proposed method only needs the Lipschitz continuation in some sense of the nonlinear term.
MTRL-SCIOct 17, 2022
Advanced Characterization-Informed Framework and Quantitative Insight to Irradiated Annular U-10Zr Metallic FuelsFei Xu, Lu Cai, Daniele Salvato et al.
U-10Zr-based metallic nuclear fuel is a promising fuel candidate for next-generation sodium-cooled fast reactors.The research experience of the Idaho National Laboratory for this type of fuel dates back to the 1960s. Idaho National Laboratory researchers have accumulated a considerable amount of experience and knowledge regarding fuel performance at the engineering scale. The limitation of advanced characterization and lack of proper data analysis tools prevented a mechanistic understanding of fuel microstructure evolution and properties degradation during irradiation. This paper proposed a new workflow, coupled with domain knowledge obtained by advanced post-irradiation examination methods, to provide unprecedented and quantified insights into the fission gas bubbles and pores, and lanthanide distribution in an annular fuel irradiated in the Advanced Test Reactor. In the study, researchers identify and confirm that the Zr-bearing secondary phases exist and generate the quantitative ratios of seven microstructures along the thermal gradient. Moreover, the distributions of fission gas bubbles on two samples of U-10Zr advanced fuels were quantitatively compared. Conclusive findings were obtained and allowed for evaluation of the lanthanide transportation through connected bubbles based on approximately 67,000 fission gas bubbles of the two advanced samples.
LGSep 28, 2023
Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded MaterialsMohammad Karimzadeh, Deekshith Basvoju, Aleksandar Vakanski et al.
Additive Manufacturing (AM) is a transformative manufacturing technology enabling direct fabrication of complex parts layer-be-layer from 3D modeling data. Among AM applications, the fabrication of Functionally Graded Materials (FGMs) has significant importance due to the potential to enhance component performance across several industries. FGMs are manufactured with a gradient composition transition between dissimilar materials, enabling the design of new materials with location-dependent mechanical and physical properties. This study presents a comprehensive review of published literature pertaining to the implementation of Machine Learning (ML) techniques in AM, with an emphasis on ML-based methods for optimizing FGMs fabrication processes. Through an extensive survey of the literature, this review article explores the role of ML in addressing the inherent challenges in FGMs fabrication and encompasses parameter optimization, defect detection, and real-time monitoring. The article also provides a discussion of future research directions and challenges in employing ML-based methods in AM fabrication of FGMs.
ROJan 5
SingingBot: An Avatar-Driven System for Robotic Face Singing PerformanceZhuoxiong Xu, Xuanchen Li, Yuhao Cheng et al.
Equipping robotic faces with singing capabilities is crucial for empathetic Human-Robot Interaction. However, existing robotic face driving research primarily focuses on conversations or mimicking static expressions, struggling to meet the high demands for continuous emotional expression and coherence in singing. To address this, we propose a novel avatar-driven framework for appealing robotic singing. We first leverage portrait video generation models embedded with extensive human priors to synthesize vivid singing avatars, providing reliable expression and emotion guidance. Subsequently, these facial features are transferred to the robot via semantic-oriented mapping functions that span a wide expression space. Furthermore, to quantitatively evaluate the emotional richness of robotic singing, we propose the Emotion Dynamic Range metric to measure the emotional breadth within the Valence-Arousal space, revealing that a broad emotional spectrum is crucial for appealing performances. Comprehensive experiments prove that our method achieves rich emotional expressions while maintaining lip-audio synchronization, significantly outperforming existing approaches.
NADec 6, 2017
Finite Element Methods For Wave Propagation With Debye Polarization In Nonlinear Dielectric MaterialsQiumei Huang, Shanghui Jia, Fei Xu et al.
In this paper, we consider the wave propagation with Debye polarization in nonlinear dielectric materials. For this model, the Rother's method is employed to derive the well-posedness of the electric fields and the existence of the polarized fields by monotonicity theorem as well as the boundedness of the two fields are established. Then, the time errors are derived for the semi-discrete solutions by the order $O(Δt)$. Subsequently, decoupled the full-discrete scheme of the Euler in time and Raviart-Thomas-N$\acute{e}$d$\acute{e}$lec element $k\geq 2$ in spatial is established. Based on the truncated error, we present the convergent analysis with the order $O(Δt+h^s) $ under the technique of a-prior $L^\infty$ assumption. For the $k=1$, we employ the superconvergence technique to ensure the a-prior $L^\infty$ assumption. In the end, we give some numerical examples to demonstrate our theories.
NAApr 18, 2016
A Multigrid Method for the Ground State Solution of Bose-Einstein Condensates Based on Newton IterationHehu Xie, Fei Xu, Meiling Yue
In this paper, a new kind of multigrid method is proposed for the ground state solution of Bose-Einstein condensates based on Newton iteration method. Instead of treating eigenvalue $λ$ and eigenvector $u$ respectively, we regard the eigenpair $(λ, u)$ as one element in the composite space $\R \times H_0^1(Ω)$ and then Newton iteration method is adopted for the nonlinear problem. Thus in this multigrid scheme, we only need to solve a linear discrete boundary value problem in every refined space, which can improve the overall efficiency for the simulation of Bose-Einstein condensations.
IVFeb 8, 2023
An Efficient Instance Segmentation Approach for Extracting Fission Gas Bubbles on U-10Zr Annular FuelShoukun Sun, Fei Xu, Lu Cai et al.
U-10Zr-based nuclear fuel is pursued as a primary candidate for next-generation sodium-cooled fast reactors. However, more advanced characterization and analysis are needed to form a fundamental understating of the fuel performance, and make U-10Zr fuel qualify for commercial use. The movement of lanthanides across the fuel section from the hot fuel center to the cool cladding surface is one of the key factors to affect fuel performance. In the advanced annular U-10Zr fuel, the lanthanides present as fission gas bubbles. Due to a lack of annotated data, existing literature utilized a multiple-threshold method to separate the bubbles and calculate bubble statistics on an annular fuel. However, the multiple-threshold method cannot achieve robust performance on images with different qualities and contrasts, and cannot distinguish different bubbles. This paper proposes a hybrid framework for efficient bubble segmentation. We develop a bubble annotation tool and generate the first fission gas bubble dataset with more than 3000 bubbles from 24 images. A multi-task deep learning network integrating U-Net and ResNet is designed to accomplish instance-level bubble segmentation. Combining the segmentation results and image processing step achieves the best recall ratio of more than 90% with very limited annotated data. Our model shows outstanding improvement by comparing the previously proposed thresholding method. The proposed method has promising to generate a more accurate quantitative analysis of fission gas bubbles on U-10Zr annular fuels. The results will contribute to identifying the bubbles with lanthanides and finally build the relationship between the thermal gradation and lanthanides movements of U-10Zr annular fuels. Mover, the deep learning model is applicable to other similar material micro-structure segmentation tasks.
CVSep 12, 2024
Cross-Attention Based Influence Model for Manual and Nonmanual Sign Language AnalysisLipisha Chaudhary, Fei Xu, Ifeoma Nwogu
Both manual (relating to the use of hands) and non-manual markers (NMM), such as facial expressions or mouthing cues, are important for providing the complete meaning of phrases in American Sign Language (ASL). Efforts have been made in advancing sign language to spoken/written language understanding, but most of these have primarily focused on manual features only. In this work, using advanced neural machine translation methods, we examine and report on the extent to which facial expressions contribute to understanding sign language phrases. We present a sign language translation architecture consisting of two-stream encoders, with one encoder handling the face and the other handling the upper body (with hands). We propose a new parallel cross-attention decoding mechanism that is useful for quantifying the influence of each input modality on the output. The two streams from the encoder are directed simultaneously to different attention stacks in the decoder. Examining the properties of the parallel cross-attention weights allows us to analyze the importance of facial markers compared to body and hand features during a translating task.
DCJul 9, 2024
FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client ClusteringMd Sirajul Islam, Simin Javaherian, Fei Xu et al.
Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. To address the performance degradation issue incurred by such data heterogeneity, clustered federated learning (CFL) shows its promise by grouping clients into separate learning clusters based on the similarity of their local data distributions. However, state-of-the-art CFL approaches require a large number of communication rounds to learn the distribution similarities during training until the formation of clusters is stabilized. Moreover, some of these algorithms heavily rely on a predefined number of clusters, thus limiting their flexibility and adaptability. In this paper, we propose {\em FedClust}, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients. {\em FedClust} groups clients into clusters in a one-shot manner by measuring the similarity degrees among clients based on the strategically selected partial weights of locally trained models. We conduct extensive experiments on four benchmark datasets with different non-IID data settings. Experimental results demonstrate that {\em FedClust} achieves higher model accuracy up to $\sim$45\% as well as faster convergence with a significantly reduced communication cost up to 2.7$\times$ compared to its state-of-the-art counterparts.
AISep 30, 2024
A Knowledge-Informed Large Language Model Framework for U.S. Nuclear Power Plant Shutdown Initiating Event Classification for Probabilistic Risk AssessmentMin Xian, Tao Wang, Sai Zhang et al.
Identifying and classifying shutdown initiating events (SDIEs) is critical for developing low power shutdown probabilistic risk assessment for nuclear power plants. Existing computational approaches cannot achieve satisfactory performance due to the challenges of unavailable large, labeled datasets, imbalanced event types, and label noise. To address these challenges, we propose a hybrid pipeline that integrates a knowledge-informed machine learning mode to prescreen non-SDIEs and a large language model (LLM) to classify SDIEs into four types. In the prescreening stage, we proposed a set of 44 SDIE text patterns that consist of the most salient keywords and phrases from six SDIE types. Text vectorization based on the SDIE patterns generates feature vectors that are highly separable by using a simple binary classifier. The second stage builds Bidirectional Encoder Representations from Transformers (BERT)-based LLM, which learns generic English language representations from self-supervised pretraining on a large dataset and adapts to SDIE classification by fine-tuning it on an SDIE dataset. The proposed approaches are evaluated on a dataset with 10,928 events using precision, recall ratio, F1 score, and average accuracy. The results demonstrate that the prescreening stage can exclude more than 97% non-SDIEs, and the LLM achieves an average accuracy of 93.4% for SDIE classification.
LGJan 21, 2023
Soft Sensing Regression Model: from Sensor to Wafer Metrology ForecastingAngzhi Fan, Yu Huang, Fei Xu et al.
The semiconductor industry is one of the most technology-evolving and capital-intensive market sectors. Effective inspection and metrology are necessary to improve product yield, increase product quality and reduce costs. In recent years, many semiconductor manufacturing equipments are equipped with sensors to facilitate real-time monitoring of the production process. These production-state and equipment-state sensor data provide an opportunity to practice machine-learning technologies in various domains, such as anomaly/fault detection, maintenance scheduling, quality prediction, etc. In this work, we focus on the task of soft sensing regression, which uses sensor data to predict impending inspection measurements that used to be measured in wafer inspection and metrology systems. We proposed an LSTM-based regressor and designed two loss functions for model training. Although engineers may look at our prediction errors in a subjective manner, a new piece-wise evaluation metric was proposed for assessing model accuracy in a mathematical way. The experimental results demonstrated that the proposed model can achieve accurate and early prediction of various types of inspections in complicated manufacturing processes.
CVDec 17, 2024Code
Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image GenerationShoukun Sun, Min Xian, Tiankai Yao et al.
Producing large images using small diffusion models is gaining increasing popularity, as the cost of training large models could be prohibitive. A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches. However, results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles. To address the issues, we proposed Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions. Moreover, we proposed Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA), which generates a coherent style for large images by adjusting the initial input noise without adding extra computational burden. Extensive experiments demonstrated that the proposed fusion methods improved the quality of the generated image significantly. The proposed method can be widely applied as a plug-and-play module to enhance other fusion-based methods for large image generation. Code: https://github.com/TitorX/GVCFDiffusion
DCDec 16, 2023Code
Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUsAodong Chen, Fei Xu, Li Han et al.
GPUs have become the \emph{defacto} hardware devices for accelerating Deep Neural Network (DNN) inference workloads. However, the conventional \emph{sequential execution mode of DNN operators} in mainstream deep learning frameworks cannot fully utilize GPU resources, even with the operator fusion enabled, due to the increasing complexity of model structures and a greater diversity of operators. Moreover, the \emph{inadequate operator launch order} in parallelized execution scenarios can lead to GPU resource wastage and unexpected performance interference among operators. In this paper, we propose \emph{Opara}, a resource- and interference-aware DNN \underline{Op}erator \underline{para}llel scheduling framework to accelerate DNN inference on GPUs. Specifically, \emph{Opara} first employs \texttt{CUDA Streams} and \texttt{CUDA Graph} to \emph{parallelize} the execution of multiple operators automatically. To further expedite DNN inference, \emph{Opara} leverages the resource demands of operators to judiciously adjust the operator launch order on GPUs, overlapping the execution of compute-intensive and memory-intensive operators. We implement and open source a prototype of \emph{Opara} based on PyTorch in a \emph{non-intrusive} manner. Extensive prototype experiments with representative DNN and Transformer-based models demonstrate that \emph{Opara} outperforms the default sequential \texttt{CUDA Graph} in PyTorch and the state-of-the-art operator parallelism systems by up to $1.68\times$ and $1.29\times$, respectively, yet with acceptable runtime overhead.
94.6CVApr 20
DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic SyntaxHang Yuan, Xiaolin Hu, Yan Wan et al.
Text-driven controllable dance generation remains under-explored, primarily due to the severe scarcity of high-quality datasets and the inherent difficulty of articulating complex choreographies. Characterizing dance is particularly challenging owing to its intricate spatial dynamics, strong directionality, and the highly decoupled movements of distinct body parts. To overcome these bottlenecks, we bridge principles from dance studies, human anatomy, and biomechanics to propose \textit{Choreographic Syntax}, a novel theoretical framework with a tailored annotation system. Grounded in this syntax, we combine professional dance archives with high-fidelity motion capture data to construct \textbf{DanceFlow}, the most fine-grained dance dataset to date. It encompasses 41 hours of high-quality motions paired with 6.34 million words of detailed descriptions. At the model level, we introduce \textbf{DanceCrafter}, a tailored motion transformer built upon the Momentum Human Rig. To circumvent optimization instabilities, we construct a continuous manifold motion representation paired with a hybrid normalization strategy. Furthermore, we design an anatomy-aware loss to explicitly regulate the decoupled nature of body parts. Together, these adaptations empower DanceCrafter to achieve the high-fidelity and stable generation of complex dance sequences. Extensive evaluations and user studies demonstrate our state-of-the-art performance in motion quality, fine-grained controllability, and generation naturalness.
DCMar 7, 2024
FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client ClusteringMd Sirajul Islam, Simin Javaherian, Fei Xu et al.
Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. A key challenge in FL is the uneven data distribution across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. Clustered federated learning (CFL) addresses this challenge by grouping clients based on the similarity of their data distributions. However, existing CFL approaches require a large number of communication rounds for stable cluster formation and rely on a predefined number of clusters, thus limiting their flexibility and adaptability. This paper proposes FedClust, a novel CFL approach leveraging correlations between local model weights and client data distributions. FedClust groups clients into clusters in a one-shot manner using strategically selected partial model weights and dynamically accommodates newcomers in real-time. Experimental results demonstrate FedClust outperforms baseline approaches in terms of accuracy and communication costs.
CVMay 13, 2024
SignAvatar: Sign Language 3D Motion Reconstruction and GenerationLu Dong, Lipisha Chaudhary, Fei Xu et al.
Achieving expressive 3D motion reconstruction and automatic generation for isolated sign words can be challenging, due to the lack of real-world 3D sign-word data, the complex nuances of signing motions, and the cross-modal understanding of sign language semantics. To address these challenges, we introduce SignAvatar, a framework capable of both word-level sign language reconstruction and generation. SignAvatar employs a transformer-based conditional variational autoencoder architecture, effectively establishing relationships across different semantic modalities. Additionally, this approach incorporates a curriculum learning strategy to enhance the model's robustness and generalization, resulting in more realistic motions. Furthermore, we contribute the ASL3DWord dataset, composed of 3D joint rotation data for the body, hands, and face, for unique sign words. We demonstrate the effectiveness of SignAvatar through extensive experiments, showcasing its superior reconstruction and automatic generation capabilities. The code and dataset are available on the project page.
CVApr 4, 2024
iSeg: Interactive 3D Segmentation via Interactive AttentionItai Lang, Fei Xu, Dale Decatur et al.
We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is highly challenging, since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.
CLApr 8, 2024
Causality Extraction from Nuclear Licensee Event Reports Using a Hybrid FrameworkShahidur Rahoman Sohag, Sai Zhang, Min Xian et al.
Industry-wide nuclear power plant operating experience is a critical source of raw data for performing parameter estimations in reliability and risk models. Much operating experience information pertains to failure events and is stored as reports containing unstructured data, such as narratives. Event reports are essential for understanding how failures are initiated and propagated, including the numerous causal relations involved. Causal relation extraction using deep learning represents a significant frontier in the field of natural language processing (NLP), and is crucial since it enables the interpretation of intricate narratives and connections contained within vast amounts of written information. This paper proposed a hybrid framework for causality detection and extraction from nuclear licensee event reports. The main contributions include: (1) we compiled an LER corpus with 20,129 text samples for causality analysis, (2) developed an interactive tool for labeling cause effect pairs, (3) built a deep-learning-based approach for causal relation detection, and (4) developed a knowledge based cause-effect extraction approach.
33.4CVApr 7
Sparse Gain Radio Map Reconstruction With Geometry Priors and Uncertainty-Guided Measurement SelectionZhihan Zeng, Ning Wei, Muhammad Baqer Mollah et al.
Radio maps are important for environment-aware wireless communication, network planning, and radio resource optimization. However, dense radio map construction remains challenging when only a limited number of measurements are available, especially in complex urban environments with strong blockages, irregular geometry, and restricted sensing accessibility. Existing methods have explored interpolation, low-rank cartography, deep completion, and channel knowledge map (CKM) construction, but many of these methods insufficiently exploit explicit geometric priors or overlook the value of predictive uncertainty for subsequent sensing. In this paper, we study sparse gain radio map reconstruction from a geometry-aware and active sensing perspective. We first construct \textbf{UrbanRT-RM}, a controllable ray-tracing benchmark with diverse urban layouts, multiple base-station deployments, and multiple sparse sampling modes. We then propose \textbf{GeoUQ-GFNet}, a lightweight network that jointly predicts a dense gain radio map and a spatial uncertainty map from sparse measurements and structured scene priors. The predicted uncertainty is further used to guide active measurement selection under limited sensing budgets. Extensive experiments show that our proposed GeoUQ-GFNet method achieves strong and consistent reconstruction performance across different scenes and transmitter placements generated using UrbanRT-RM. Moreover, uncertainty-guided querying provides more effective reconstruction improvement than non-adaptive sampling under the same additional measurement budget. These results demonstrate the effectiveness of combining geometry-aware learning, uncertainty estimation, and benchmark-driven evaluation for sparse radio map reconstruction in complex urban environments.
CVSep 10, 2025
RU-Net for Automatic Characterization of TRISO Fuel Cross SectionsLu Cai, Fei Xu, Min Xian et al.
During irradiation, phenomena such as kernel swelling and buffer densification may impact the performance of tristructural isotropic (TRISO) particle fuel. Post-irradiation microscopy is often used to identify these irradiation-induced morphologic changes. However, each fuel compact generally contains thousands of TRISO particles. Manually performing the work to get statistical information on these phenomena is cumbersome and subjective. To reduce the subjectivity inherent in that process and to accelerate data analysis, we used convolutional neural networks (CNNs) to automatically segment cross-sectional images of microscopic TRISO layers. CNNs are a class of machine-learning algorithms specifically designed for processing structured grid data. They have gained popularity in recent years due to their remarkable performance in various computer vision tasks, including image classification, object detection, and image segmentation. In this research, we generated a large irradiated TRISO layer dataset with more than 2,000 microscopic images of cross-sectional TRISO particles and the corresponding annotated images. Based on these annotated images, we used different CNNs to automatically segment different TRISO layers. These CNNs include RU-Net (developed in this study), as well as three existing architectures: U-Net, Residual Network (ResNet), and Attention U-Net. The preliminary results show that the model based on RU-Net performs best in terms of Intersection over Union (IoU). Using CNN models, we can expedite the analysis of TRISO particle cross sections, significantly reducing the manual labor involved and improving the objectivity of the segmentation results.
CVAug 6, 2025
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human InteractionsLiang Xu, Chengqun Yang, Zili Lin et al.
Learning action models from real-world human-centric interaction datasets is important towards building general-purpose intelligent assistants with efficiency. However, most existing datasets only offer specialist interaction category and ignore that AI assistants perceive and act based on first-person acquisition. We urge that both the generalist interaction knowledge and egocentric modality are indispensable. In this paper, we embed the manual-assisted task into a vision-language-action framework, where the assistant provides services to the instructor following egocentric vision and commands. With our hybrid RGB-MoCap system, pairs of assistants and instructors engage with multiple objects and the scene following GPT-generated scripts. Under this setting, we accomplish InterVLA, the first large-scale human-object-human interaction dataset with 11.4 hours and 1.2M frames of multimodal data, spanning 2 egocentric and 5 exocentric videos, accurate human/object motions and verbal commands. Furthermore, we establish novel benchmarks on egocentric human motion estimation, interaction synthesis, and interaction prediction with comprehensive analysis. We believe that our InterVLA testbed and the benchmarks will foster future works on building AI agents in the physical world.
DCMar 13, 2025
Resource Heterogeneity-Aware and Utilization-Enhanced Scheduling for Deep Learning ClustersAbeda Sultana, Nabin Pakka, Fei Xu et al.
Scheduling deep learning (DL) models to train on powerful clusters with accelerators like GPUs and TPUs, presently falls short, either lacking fine-grained heterogeneity awareness or leaving resources substantially under-utilized. To fill this gap, we propose a novel design of a task-level heterogeneity-aware scheduler, Hadar, based on an optimization framework that can boost resource utilization. Hadar leverages the performance traits of DL jobs on a heterogeneous DL cluster, characterizes the task-level performance heterogeneity in the optimization problem, and makes scheduling decisions across both spatial and temporal dimensions. It involves the primal-dual framework employing a dual subroutine, to solve the optimization problem and guide the scheduling design. Our trace-driven simulation with representative DL model training workloads demonstrates that Hadar accelerates the total time duration by 1.20x when compared with its state-of-the-art heterogeneity-aware counterpart, Gavel. Further, our Hadar scheduler is enhanced to HadarE by forking each job into multiple copies to let a job train concurrently on heterogeneous GPUs resided on separate available nodes (i.e., machines or servers) for resource utilization enhancement. HadarE is evaluated extensively on physical DL clusters for comparison with Hadar and Gavel. With substantial enhancement in cluster resource utilization (by 1.45x), HadarE exhibits considerable speed-ups in DL model training, reducing the total time duration by 50% (or 80%) on an Amazon's AWS (or our lab) cluster, while producing trained DL models with consistently better inference quality than those trained by Hadar.
DCFeb 22, 2025
SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective TrainingMd Sirajul Islam, Sanjeev Panta, Fei Xu et al.
Federated Learning (FL) is a promising distributed machine learning framework that allows collaborative learning of a global model across decentralized devices without uploading their local data. However, in real-world FL scenarios, the conventional synchronous FL mechanism suffers from inefficient training caused by slow-speed devices, commonly known as stragglers, especially in heterogeneous communication environments. Though asynchronous FL effectively tackles the efficiency challenge, it induces substantial system overheads and model degradation. Striking for a balance, semi-asynchronous FL has gained increasing attention, while still suffering from the open challenge of stale models, where newly arrived updates are calculated based on outdated weights that easily hurt the convergence of the global model. In this paper, we present {\em SEAFL}, a novel FL framework designed to mitigate both the straggler and the stale model challenges in semi-asynchronous FL. {\em SEAFL} dynamically assigns weights to uploaded models during aggregation based on their staleness and importance to the current global model. We theoretically analyze the convergence rate of {\em SEAFL} and further enhance the training efficiency with an extended variant that allows partial training on slower devices, enabling them to contribute to global aggregation while reducing excessive waiting times. We evaluate the effectiveness of {\em SEAFL} through extensive experiments on three benchmark datasets. The experimental results demonstrate that {\em SEAFL} outperforms its closest counterpart by up to $\sim$22\% in terms of the wall-clock training time required to achieve target accuracy.
LGMay 12, 2023
AGFormer: Efficient Graph Representation with Anchor-Graph TransformerBo Jiang, Fei Xu, Ziyan Zhang et al.
To alleviate the local receptive issue of GCN, Transformers have been exploited to capture the long range dependences of nodes for graph data representation and learning. However, existing graph Transformers generally employ regular self-attention module for all node-to-node message passing which needs to learn the affinities/relationships between all node's pairs, leading to high computational cost issue. Also, they are usually sensitive to graph noises. To overcome this issue, we propose a novel graph Transformer architecture, termed Anchor Graph Transformer (AGFormer), by leveraging an anchor graph model. To be specific, AGFormer first obtains some representative anchors and then converts node-to-node message passing into anchor-to-anchor and anchor-to-node message passing process. Thus, AGFormer performs much more efficiently and also robustly than regular node-to-node Transformers. Extensive experiments on several benchmark datasets demonstrate the effectiveness and benefits of proposed AGFormer.
CLOct 16, 2021
Multimodal Dialogue Response GenerationQingfeng Sun, Yujing Wang, Can Xu et al.
Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.
CLOct 3, 2021
Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained ModelsWenqian Ye, Fei Xu, Yaojia Huang et al.
Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases. In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models. Finally, we discussed the evaluation efficacy of our generated examples on reducing gender bias for future research.
MTRL-SCIApr 12, 2021
Understanding Fission Gas Bubble Distribution, Lanthanide Transportation, and Thermal Conductivity Degradation in Neutron-irradiated α-U Using Machine LearningLu Cai, Fei Xu, Fidelma Dilemma et al.
UZr based metallic nuclear fuel is the leading candidate for next-generation sodium-cooled fast reactors in the United States. US research reactors have been using and testing this fuel type since the 1960s and accumulated considerable experience and knowledge about the fuel performance. However, most of knowledge remains empirical. The lack of mechanistic understanding of fuel performance is preventing the qualification of UZr fuel for commercial use. This paper proposes a data-driven approach, coupled with advanced post irradiation examination, powered by machine learning algorithms, to facilitate the development of such understandings by providing unpreceded quantified new insights into fission gas bubbles. Specifically, based on the advanced postirradiation examination data collected on a neutron-irradiated U-10Zr annular fuel, we developed a method to automatically detect, classify ~19,000 fission gas bubbles into different categories, and quantitatively link the data to lanthanide transpiration along the radial temperature gradient. The approach is versatile and can be modified to study different coupled irradiation effects, such as secondary phase redistribution and degradation of thermal conductivity, in irradiated nuclear fuel.
CVOct 23, 2019
Breast Anatomy Enriched Tumor Saliency EstimationFei Xu, Yingtao Zhang, Min Xian et al.
Breast cancer investigation is of great significance, and developing tumor detection methodologies is a critical need. However, it is a challenging task for breast ultrasound due to the complicated breast structure and poor quality of the images. In this paper, we propose a novel tumor saliency estimation model guided by enriched breast anatomy knowledge to localize the tumor. Firstly, the breast anatomy layers are generated by a deep neural network. Then we refine the layers by integrating a non-semantic breast anatomy model to solve the problems of incomplete mammary layers. Meanwhile, a new background map generation method weighted by the semantic probability and spatial distance is proposed to improve the performance. The experiment demonstrates that the proposed method with the new background map outperforms four state-of-the-art TSE models with increasing 10% of F_meansure on the BUS public dataset.
CVJun 18, 2019
Tumor Saliency Estimation for Breast Ultrasound Images via Breast Anatomy ModelingFei Xu, Yingtao Zhang, Min Xian et al.
Tumor saliency estimation aims to localize tumors by modeling the visual stimuli in medical images. However, it is a challenging task for breast ultrasound due to the complicated anatomic structure of the breast and poor image quality; and existing saliency estimation approaches only model generic visual stimuli, e.g., local and global contrast, location, and feature correlation, and achieve poor performance for tumor saliency estimation. In this paper, we propose a novel optimization model to estimate tumor saliency by utilizing breast anatomy. First, we model breast anatomy and decompose breast ultrasound image into layers using Neutro-Connectedness; then utilize the layers to generate the foreground and background maps; and finally propose a novel objective function to estimate the tumor saliency by integrating the foreground map, background map, adaptive center bias, and region-based correlation cues. The extensive experiments demonstrate that the proposed approach obtains more accurate foreground and background maps with the assistance of the breast anatomy; especially, for the images having large or small tumors; meanwhile, the new objective function can handle the images without tumors. The newly proposed method achieves state-of-the-art performance when compared to eight tumor saliency estimation approaches using two breast ultrasound datasets.
CVJun 27, 2018
A Hybrid Framework for Tumor Saliency EstimationFei Xu, Min Xian, Yingtao Zhang et al.
Automatic tumor segmentation of breast ultrasound (BUS) image is quite challenging due to the complicated anatomic structure of breast and poor image quality. Most tumor segmentation approaches achieve good performance on BUS images collected in controlled settings; however, the performance degrades greatly with BUS images from different sources. Tumor saliency estimation (TSE) has attracted increasing attention to solving the problem by modeling radiologists' attention mechanism. In this paper, we propose a novel hybrid framework for TSE, which integrates both high-level domain-knowledge and robust low-level saliency assumptions and can overcome drawbacks caused by direct mapping in traditional TSE approaches. The new framework integrated the Neutro-Connectedness (NC) map, the adaptive-center, the correlation and the layer structure-based weighted map. The experimental results demonstrate that the proposed approach outperforms state-of-the-art TSE methods.
CVJan 9, 2018
BUSIS: A Benchmark for Breast Ultrasound Image SegmentationMin Xian, Yingtao Zhang, H. D. Cheng et al.
Breast ultrasound (BUS) image segmentation is challenging and critical for BUS Comput-er-Aided Diagnosis (CAD) systems. Many BUS segmentation approaches have been studied in the last two decades, but the performances of most approaches have been assessed using relatively small private datasets with different quantitative metrics, which results in a discrepancy in performance comparison. Therefore, there is a pressing need for building a benchmark to compare existing methods using a public dataset objectively, to determine the performance of the best breast tumor segmentation algorithm available today, and to investigate what segmentation strategies are valuable in clinical practice and theoretical study. In this work, a benchmark for B-mode breast ultrasound image segmentation is presented. In the benchmark, 1) we collected 562 breast ultrasound images, prepared a software tool, and involved four radiologists in obtaining accurate annotations through standardized procedures; 2) we extensively compared the performance of sixteen state-of-the-art segmentation methods and discussed their advantages and disadvantages; 3) we proposed a set of valuable quantitative metrics to evaluate both semi-automatic and fully automatic segmentation approaches; and 4) the successful segmentation strategies and possible future improvements are discussed in details.
CVApr 4, 2017
Automatic Breast Ultrasound Image Segmentation: A SurveyMin Xian, Yingtao Zhang, H. D. Cheng et al.
Breast cancer is one of the leading causes of cancer death among women worldwide. In clinical routine, automatic breast ultrasound (BUS) image segmentation is very challenging and essential for cancer diagnosis and treatment planning. Many BUS segmentation approaches have been studied in the last two decades, and have been proved to be effective on private datasets. Currently, the advancement of BUS image segmentation seems to meet its bottleneck. The improvement of the performance is increasingly challenging, and only few new approaches were published in the last several years. It is the time to look at the field by reviewing previous approaches comprehensively and to investigate the future directions. In this paper, we study the basic ideas, theories, pros and cons of the approaches, group them into categories, and extensively review each category in depth by discussing the principles, application issues, and advantages/disadvantages.
CVDec 19, 2015
Neutro-Connectedness CutMin Xian, Yingtao Zhang, H. D. Cheng et al.
Interactive image segmentation is a challenging task and receives increasing attention recently; however, two major drawbacks exist in interactive segmentation approaches. First, the segmentation performance of ROI-based methods is sensitive to the initial ROI: different ROIs may produce results with great difference. Second, most seed-based methods need intense interactions, and are not applicable in many cases. In this work, we generalize the Neutro-Connectedness (NC) to be independent of top-down priors of objects and to model image topology with indeterminacy measurement on image regions, propose a novel method for determining object and background regions, which is applied to exclude isolated background regions and enforce label consistency, and put forward a hybrid interactive segmentation method, Neutro-Connectedness Cut (NC-Cut), which can overcome the above two problems by utilizing both pixel-wise appearance information and region-based NC properties. We evaluate the proposed NC-Cut by employing two image datasets (265 images), and demonstrate that the proposed approach outperforms state-of-the-art interactive image segmentation methods (Grabcut, MILCut, One-Cut, MGC_max^sum and pPBC).
CVAug 24, 2015
An algorithm for Left Atrial Thrombi detection using Transesophageal EchocardiographyJianrui Ding, Min Xian, H. D. Cheng et al.
Transesophageal echocardiography (TEE) is widely used to detect left atrium (LA)/left atrial appendage (LAA) thrombi. In this paper, the local binary pattern variance (LBPV) features are extracted from region of interest (ROI). And the dynamic features are formed by using the information of its neighbor frames in the sequence. The sequence is viewed as a bag, and the images in the sequence are considered as the instances. Multiple-instance learning (MIL) method is employed to solve the LAA thrombi detection. The experimental results show that the proposed method can achieve better performance than that by using other methods.