LGMar 21, 2023
Indeterminate Probability TheoryTao Yang, Chuang Liu, Xiaofeng Ma et al.
Complex continuous or mixed joint distributions (e.g., P(Y | z_1, z_2, ..., z_N)) generally lack closed-form solutions, often necessitating approximations such as MCMC. This paper proposes Indeterminate Probability Theory (IPT), which makes the following contributions: (1) An observer-centered framework in which experimental outcomes are represented as distributions combining ground truth with observation error; (2) The introduction of three independence candidate axioms that enable a two-phase probabilistic inference framework; (3) The derivation of closed-form solutions for arbitrary complex joint distributions under this framework. Both the Indeterminate Probability Neural Network (IPNN) model and the non-neural multivariate time series forecasting application demonstrate IPT's effectiveness in modeling high-dimensional distributions, with successful validation up to 1000 dimensions. Importantly, IPT is consistent with classical probability theory and subsumes the frequentist equation in the limit of vanishing observation error.
CVSep 6, 2024
BFA-YOLO: A balanced multiscale object detection network for building façade attachments detectionYangguang Chen, Tong Wang, Guanzhou Chen et al.
The detection of façade elements on buildings, such as doors, windows, balconies, air conditioning units, billboards, and glass curtain walls, is a critical step in automating the creation of Building Information Modeling (BIM). Yet, this field faces significant challenges, including the uneven distribution of façade elements, the presence of small objects, and substantial background noise, which hamper detection accuracy. To address these issues, we develop the BFA-YOLO model and the BFA-3D dataset in this study. The BFA-YOLO model is an advanced architecture designed specifically for analyzing multi-view images of façade attachments. It integrates three novel components: the Feature Balanced Spindle Module (FBSM) that tackles the issue of uneven object distribution; the Target Dynamic Alignment Task Detection Head (TDATH) that enhances the detection of small objects; and the Position Memory Enhanced Self-Attention Mechanism (PMESA), aimed at reducing the impact of background noise. These elements collectively enable BFA-YOLO to effectively address each challenge, thereby improving model robustness and detection precision. The BFA-3D dataset, offers multi-view images with precise annotations across a wide range of façade attachment categories. This dataset is developed to address the limitations present in existing façade detection datasets, which often feature a single perspective and insufficient category coverage. Through comparative analysis, BFA-YOLO demonstrated improvements of 1.8\% and 2.9\% in mAP$_{50}$ on the BFA-3D dataset and the public Façade-WHU dataset, respectively, when compared to the baseline YOLOv8 model. These results highlight the superior performance of BFA-YOLO in façade element detection and the advancement of intelligent BIM technologies.
CVDec 27, 2023Code
Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of BuildingsXiaoliang Tan, Guanzhou Chen, Tong Wang et al.
The field of Remote Sensing (RS) widely employs Change Detection (CD) on very-high-resolution (VHR) images. A majority of extant deep-learning-based methods hinge on annotated samples to complete the CD process. Recently, the emergence of Vision Foundation Model (VFM) enables zero-shot predictions in particular vision tasks. In this work, we propose an unsupervised CD method named Segment Change Model (SCM), built upon the Segment Anything Model (SAM) and Contrastive Language-Image Pre-training (CLIP). Our method recalibrates features extracted at different scales and integrates them in a top-down manner to enhance discriminative change edges. We further design an innovative Piecewise Semantic Attention (PSA) scheme, which can offer semantic representation without training, thereby minimize pseudo change phenomenon. Through conducting experiments on two public datasets, the proposed SCM increases the mIoU from 46.09% to 53.67% on the LEVIR-CD dataset, and from 47.56% to 52.14% on the WHU-CD dataset. Our codes are available at https://github.com/StephenApX/UCD-SCM.
CVMay 7, 2022
Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigmYuhan Xie, Kexin Jiang, Zhiyong Zhang et al.
Medical image segmentation based on deep learning is often faced with the problems of insufficient datasets and long time-consuming labeling. In this paper, we introduce the self-supervised method MAE(Masked Autoencoders) into knee joint images to provide a good initial weight for the segmentation model and improve the adaptability of the model to small datasets. Secondly, we propose a weakly supervised paradigm for meniscus segmentation based on the combination of point and line to reduce the time of labeling. Based on the weak label ,we design a region growing algorithm to generate pseudo-label. Finally we train the segmentation network based on pseudo-labels with weight transfer from self-supervision. Sufficient experimental results show that our proposed method combining self-supervision and weak supervision can almost approach the performance of purely fully supervised models while greatly reducing the required labeling time and dataset size.
CVFeb 17, 2024Code
Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation StrategiesYifei Chen, Chenyan Zhang, Yifan Ke et al.
Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances the model's performance and generalizability through data augmentation processing, employing varied strategies for unlabeled data. Concurrently, the model design gives appropriate emphasis to the generation, filtration, and refinement processes of pseudo-labels. The novel concept of cross-pseudo-supervision is introduced, integrating consistency learning with self-training. This enables the model to fully leverage pseudo-labels from multiple perspectives, thereby enhancing training diversity. The DFCPS model is compared with both baseline and advanced models using the publicly accessible Kvasir-SEG dataset. Across all four subdivisions containing different proportions of unlabeled data, our model consistently exhibits superior performance. Our source code is available at https://github.com/JustlfC03/DFCPS.
CVJan 3, 2024Code
S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar ImageryQingyuan Yang, Guanzhou Chen, Xiaoliang Tan et al.
Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.
CVJun 11, 2025Code
MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image LearningTong Wang, Guanzhou Chen, Xiaodong Zhang et al.
Remote sensing image interpretation plays a critical role in environmental monitoring, urban planning, and disaster assessment. However, acquiring high-quality labeled data is often costly and time-consuming. To address this challenge, we proposes a multi-modal self-supervised learning framework that leverages high-resolution RGB images, multi-spectral data, and digital surface models (DSM) for pre-training. By designing an information-aware adaptive masking strategy, cross-modal masking mechanism, and multi-task self-supervised objectives, the framework effectively captures both the correlations across different modalities and the unique feature structures within each modality. We evaluated the proposed method on multiple downstream tasks, covering typical remote sensing applications such as scene classification, semantic segmentation, change detection, object detection, and depth estimation. Experiments are conducted on 15 remote sensing datasets, encompassing 26 tasks. The results demonstrate that the proposed method outperforms existing pretraining approaches in most tasks. Specifically, on the Potsdam and Vaihingen semantic segmentation tasks, our method achieved mIoU scores of 78.30\% and 76.50\%, with only 50\% train-set. For the US3D depth estimation task, the RMSE error is reduced to 0.182, and for the binary change detection task in SECOND dataset, our method achieved mIoU scores of 47.51\%, surpassing the second CS-MAE by 3 percentage points. Our pretrain code, checkpoints, and HR-Pairs dataset can be found in https://github.com/CVEO/MSSDF.
CVApr 21, 2024
LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote SensingTong Wang, Guanzhou Chen, Xiaodong Zhang et al.
Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel \textbf{L}ightweight \textbf{M}ultimodal data \textbf{F}usion \textbf{Net}work (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a \textit{Multimodal Feature Fusion Reconstruction Layer} and \textit{Multimodal Feature Self-Attention Fusion Layer}, which can reconstruct and fuse multimodal features. Extensive testing on public datasets such as US3D, ISPRS Potsdam, and ISPRS Vaihingen demonstrates the effectiveness of LMFNet. Specifically, it achieves a mean Intersection over Union ($mIoU$) of 85.09\% on the US3D dataset, marking a significant improvement over existing methods. Compared to unimodal approaches, LMFNet shows a 10\% enhancement in $mIoU$ with only a 0.5M increase in parameter count. Furthermore, against bimodal methods, our approach with trilateral inputs enhances $mIoU$ by 0.46 percentage points.
LGApr 18, 2025
Variational Autoencoder Framework for Hyperspectral Retrievals (Hyper-VAE) of Phytoplankton Absorption and Chlorophyll a in Coastal Waters for NASA's EMIT and PACE MissionsJiadong Lou, Bingqing Liu, Yuanheng Xiong et al.
Phytoplankton absorb and scatter light in unique ways, subtly altering the color of water, changes that are often minor for human eyes to detect but can be captured by sensitive ocean color instruments onboard satellites from space. Hyperspectral sensors, paired with advanced algorithms, are expected to significantly enhance the characterization of phytoplankton community composition, especially in coastal waters where ocean color remote sensing applications have historically encountered significant challenges. This study presents novel machine learning-based solutions for NASA's hyperspectral missions, including EMIT and PACE, tackling high-fidelity retrievals of phytoplankton absorption coefficient and chlorophyll a from their hyperspectral remote sensing reflectance. Given that a single Rrs spectrum may correspond to varied combinations of inherent optical properties and associated concentrations, the Variational Autoencoder (VAE) is used as a backbone in this study to handle such multi-distribution prediction problems. We first time tailor the VAE model with innovative designs to achieve hyperspectral retrievals of aphy and of Chl-a from hyperspectral Rrs in optically complex estuarine-coastal waters. Validation with extensive experimental observation demonstrates superior performance of the VAE models with high precision and low bias. The in-depth analysis of VAE's advanced model structures and learning designs highlights the improvement and advantages of VAE-based solutions over the mixture density network (MDN) approach, particularly on high-dimensional data, such as PACE. Our study provides strong evidence that current EMIT and PACE hyperspectral data as well as the upcoming Surface Biology Geology mission will open new pathways toward a better understanding of phytoplankton community dynamics in aquatic ecosystems when integrated with AI technologies.
ROMar 7, 2024
LitSim: A Conflict-aware Policy for Long-term Interactive Traffic SimulationHaojie Xin, Xiaodong Zhang, Renzhi Tang et al.
Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages of high efficiency and low cost compared to on-road testing. Bridging the gap between simulation and the real world requires realistic agent behaviors. However, the existing works have the following shortcomings in achieving this goal: (1) log replay offers realistic scenarios but often leads to collisions due to the absence of dynamic interactions, and (2) both heuristic-based and data-based solutions, which are parameterized and trained on real-world datasets, encourage interactions but often deviate from real-world data over long horizons. In this work, we propose LitSim, a long-term interactive simulation approach that maximizes realism by minimizing the interventions in the log. Specifically, our approach primarily uses log replay to ensure realism and intervenes only when necessary to prevent potential conflicts. We then encourage interactions among the agents and resolve the conflicts, thereby reducing the risk of unrealistic behaviors. We train and validate our model on the real-world dataset NGSIM, and the experimental results demonstrate that LitSim outperforms the currently popular approaches in terms of realism and reactivity.
CVFeb 5
ROMAN: Reward-Orchestrated Multi-Head Attention Network for Autonomous Driving System TestingJianlei Chi, Yuzhen Wu, Jiaxuan Hou et al.
Automated Driving System (ADS) acts as the brain of autonomous vehicles, responsible for their safety and efficiency. Safe deployment requires thorough testing in diverse real-world scenarios and compliance with traffic laws like speed limits, signal obedience, and right-of-way rules. Violations like running red lights or speeding pose severe safety risks. However, current testing approaches face significant challenges: limited ability to generate complex and high-risk law-breaking scenarios, and failing to account for complex interactions involving multiple vehicles and critical situations. To address these challenges, we propose ROMAN, a novel scenario generation approach for ADS testing that combines a multi-head attention network with a traffic law weighting mechanism. ROMAN is designed to generate high-risk violation scenarios to enable more thorough and targeted ADS evaluation. The multi-head attention mechanism models interactions among vehicles, traffic signals, and other factors. The traffic law weighting mechanism implements a workflow that leverages an LLM-based risk weighting module to evaluate violations based on the two dimensions of severity and occurrence. We have evaluated ROMAN by testing the Baidu Apollo ADS within the CARLA simulation platform and conducting extensive experiments to measure its performance. Experimental results demonstrate that ROMAN surpassed state-of-the-art tools ABLE and LawBreaker by achieving 7.91% higher average violation count than ABLE and 55.96% higher than LawBreaker, while also maintaining greater scenario diversity. In addition, only ROMAN successfully generated violation scenarios for every clause of the input traffic laws, enabling it to identify more high-risk violations than existing approaches.
IVAug 4, 2025
Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Organized Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)Anindo Saha, Joeran S. Bosma, Jasper J. Twilt et al.
In this intercontinental, confirmatory study, we include a retrospective cohort of 22,481 MRI examinations (21,288 patients; 46 cities in 22 countries) to train and externally validate the PI-CAI-2B model, i.e., an efficient, next-generation iteration of the state-of-the-art AI system that was developed for detecting Gleason grade group $\geq$2 prostate cancer on MRI during the PI-CAI study. Of these examinations, 20,471 cases (19,278 patients; 26 cities in 14 countries) from two EU Horizon projects (ProCAncer-I, COMFORT) and 12 independent centers based in Europe, North America, Asia and Africa, are used for training and internal testing. Additionally, 2010 cases (2010 patients; 20 external cities in 12 countries) from population-based screening (STHLM3-MRI, IP1-PROSTAGRAM trials) and primary diagnostic settings (PRIME trial) based in Europe, North and South Americas, Asia and Australia, are used for external testing. Primary endpoint is the proportion of AI-based assessments in agreement with the standard of care diagnoses (i.e., clinical assessments made by expert uropathologists on histopathology, if available, or at least two expert urogenital radiologists in consensus; with access to patient history and peer consultation) in the detection of Gleason grade group $\geq$2 prostate cancer within the external testing cohorts. Our statistical analysis plan is prespecified with a hypothesis of diagnostic interchangeability to the standard of care at the PI-RADS $\geq$3 (primary diagnosis) or $\geq$4 (screening) cut-off, considering an absolute margin of 0.05 and reader estimates derived from the PI-CAI observer study (62 radiologists reading 400 cases). Secondary measures comprise the area under the receiver operating characteristic curve (AUROC) of the AI system stratified by imaging quality, patient age and patient ethnicity to identify underlying biases (if any).
CVAug 4, 2025
SGAD: Semantic and Geometric-aware Descriptor for Local Feature MatchingXiangzeng Liu, Chi Wang, Guanglu Shi et al.
Local feature matching remains a fundamental challenge in computer vision. Recent Area to Point Matching (A2PM) methods have improved matching accuracy. However, existing research based on this framework relies on inefficient pixel-level comparisons and complex graph matching that limit scalability. In this work, we introduce the Semantic and Geometric-aware Descriptor Network (SGAD), which fundamentally rethinks area-based matching by generating highly discriminative area descriptors that enable direct matching without complex graph optimization. This approach significantly improves both accuracy and efficiency of area matching. We further improve the performance of area matching through a novel supervision strategy that decomposes the area matching task into classification and ranking subtasks. Finally, we introduce the Hierarchical Containment Redundancy Filter (HCRF) to eliminate overlapping areas by analyzing containment graphs. SGAD demonstrates remarkable performance gains, reducing runtime by 60x (0.82s vs. 60.23s) compared to MESA. Extensive evaluations show consistent improvements across multiple point matchers: SGAD+LoFTR reduces runtime compared to DKM, while achieving higher accuracy (0.82s vs. 1.51s, 65.98 vs. 61.11) in outdoor pose estimation, and SGAD+ROMA delivers +7.39% AUC@5° in indoor pose estimation, establishing a new state-of-the-art.
CVJun 24, 2025
AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular DataCongjing Yu, Jing Ye, Yang Liu et al.
Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.
AIJun 15, 2025
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language ModelsPeiyuan Tang, Haojie Xin, Xiaodong Zhang et al.
As Vision-Language Models (VLMs) demonstrate increasing capabilities across real-world applications such as code generation and chatbot assistance, ensuring their safety has become paramount. Unlike traditional Large Language Models (LLMs), VLMs face unique vulnerabilities due to their multimodal nature, allowing adversaries to modify visual or textual inputs to bypass safety guardrails and trigger the generation of harmful content. Through systematic analysis of VLM behavior under attack, we identify a novel phenomenon termed ``delayed safety awareness''. Specifically, we observe that safety-aligned VLMs may initially be compromised to produce harmful content, but eventually recognize the associated risks and attempt to self-correct. This pattern suggests that VLMs retain their underlying safety awareness but experience a temporal delay in their activation. Building on this insight, we hypothesize that VLMs' safety awareness can be proactively reactivated through carefully designed prompts. To this end, we introduce ``The Safety Reminder'', a soft prompt tuning approach that optimizes learnable prompt tokens, which are periodically injected during the text generation process to enhance safety awareness, effectively preventing harmful content generation. Additionally, our safety reminder only activates when harmful content is detected, leaving normal conversations unaffected and preserving the model's performance on benign tasks. Through comprehensive evaluation across three established safety benchmarks and one adversarial attacks, we demonstrate that our approach significantly reduces attack success rates while maintaining model utility, offering a practical solution for deploying safer VLMs in real-world applications.
CVMay 9, 2025
BrainSegDMlF: A Dynamic Fusion-enhanced SAM for Brain Lesion SegmentationHongming Wang, Yifeng Wu, Huimin Huang et al.
The segmentation of substantial brain lesions is a significant and challenging task in the field of medical image segmentation. Substantial brain lesions in brain imaging exhibit high heterogeneity, with indistinct boundaries between lesion regions and normal brain tissue. Small lesions in single slices are difficult to identify, making the accurate and reproducible segmentation of abnormal regions, as well as their feature description, highly complex. Existing methods have the following limitations: 1) They rely solely on single-modal information for learning, neglecting the multi-modal information commonly used in diagnosis. This hampers the ability to comprehensively acquire brain lesion information from multiple perspectives and prevents the effective integration and utilization of multi-modal data inputs, thereby limiting a holistic understanding of lesions. 2) They are constrained by the amount of data available, leading to low sensitivity to small lesions and difficulty in detecting subtle pathological changes. 3) Current SAM-based models rely on external prompts, which cannot achieve automatic segmentation and, to some extent, affect diagnostic efficiency.To address these issues, we have developed a large-scale fully automated segmentation model specifically designed for brain lesion segmentation, named BrainSegDMLF. This model has the following features: 1) Dynamic Modal Interactive Fusion (DMIF) module that processes and integrates multi-modal data during the encoding process, providing the SAM encoder with more comprehensive modal information. 2) Layer-by-Layer Upsampling Decoder, enabling the model to extract rich low-level and high-level features even with limited data, thereby detecting the presence of small lesions. 3) Automatic segmentation masks, allowing the model to generate lesion masks automatically without requiring manual prompts.
LGDec 12, 2024
Integrated trucks assignment and scheduling problem with mixed service mode docks: A Q-learning based adaptive large neighborhood search algorithmYueyi Li, Mehrdad Mohammadi, Xiaodong Zhang et al.
Mixed service mode docks enhance efficiency by flexibly handling both loading and unloading trucks in warehouses. However, existing research often predetermines the number and location of these docks prior to planning truck assignment and sequencing. This paper proposes a new model integrating dock mode decision, truck assignment, and scheduling, thus enabling adaptive dock mode arrangements. Specifically, we introduce a Q-learning-based adaptive large neighborhood search (Q-ALNS) algorithm to address the integrated problem. The algorithm adjusts dock modes via perturbation operators, while truck assignment and scheduling are solved using destroy and repair local search operators. Q-learning adaptively selects these operators based on their performance history and future gains, employing the epsilon-greedy strategy. Extensive experimental results and statistical analysis indicate that the Q-ALNS benefits from efficient operator combinations and its adaptive mechanism, consistently outperforming benchmark algorithms in terms of optimality gap and Pareto front discovery. In comparison to the predetermined service mode, our adaptive strategy results in lower average tardiness and makespan, highlighting its superior adaptability to varying demands.
CLMay 6, 2024
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced DistillationWenjie Zhou, Zhenxin Ding, Xiaodong Zhang et al.
Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student model performance, how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments, enabling the student model to achieve results comparable to that of teacher ensembles. Our experiments show that GOVERN remarkably requires a mere 1\% of the ensemble method's inference budget to achieve 99.5\% of performance. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system, demonstrating its real-world applicability.
CLFeb 17, 2021
First Target and Opinion then Polarity: Enhancing Target-opinion Correlation for Aspect Sentiment Triplet ExtractionLianzhe Huang, Peiyi Wang, Sujian Li et al.
Aspect Sentiment Triplet Extraction (ASTE) aims to extract triplets from a sentence, including target entities, associated sentiment polarities, and opinion spans which rationalize the polarities. Existing methods are short on building correlation between target-opinion pairs, and neglect the mutual interference among different sentiment triplets. To address these issues, we utilize a two-stage framework to enhance the correlation between targets and opinions: at stage one, we extract targets and opinions through sequence tagging; then we append a group of artificial tags named Perceivable Pair, which indicate the span of a specific target-opinion tuple, to the input sentence to obtain closer correlated target-opinion pair representation. Meanwhile, we reduce the negative interference between triplets by restricting tokens' attention field. Finally, the polarity is identified according to the representation of the Perceivable Pair. We conduct experiments on four datasets, and the experimental results show the effectiveness of our model.
CLOct 6, 2019
Text Level Graph Neural Network for Text ClassificationLianzhe Huang, Dehong Ma, Sujian Li et al.
Recently, researches have explored the graph neural network (GNN) techniques on text classification, since GNN does well in handling complex structures and preserving global information. However, previous methods based on GNN are mainly faced with the practical problems of fixed corpus level graph structure which do not support online testing and high memory consumption. To tackle the problems, we propose a new GNN based model that builds graphs for each input text with global parameters sharing instead of a single graph for the whole corpus. This method removes the burden of dependence between an individual text and entire corpus which support online testing, but still preserve global information. Besides, we build graphs by much smaller windows in the text, which not only extract more local features but also significantly reduce the edge numbers as well as memory consumption. Experiments show that our model outperforms existing models on several text classification datasets even with consuming less memory.
IVAug 5, 2019
Knee menisci segmentation and relaxometry of 3D ultrashort echo time (UTE) cones MR imaging using attention U-Net with transfer learningMichal Byra, Mei Wu, Xiaodong Zhang et al.
The purpose of this work is to develop a deep learning-based method for knee menisci segmentation in 3D ultrashort echo time (UTE) cones magnetic resonance (MR) imaging, and to automatically determine MR relaxation times, namely the T1, T1$ρ$, and T2* parameters, which can be used to assess knee osteoarthritis (OA). Whole knee joint imaging was performed using 3D UTE cones sequences to collect data from 61 human subjects. Regions of interest (ROIs) were outlined by two experienced radiologists based on subtracted T1$ρ$-weighted MR images. Transfer learning was applied to develop 2D attention U-Net convolutional neural networks for the menisci segmentation based on each radiologist's ROIs separately. Dice scores were calculated to assess segmentation performance. Next, the T1, T1$ρ$, T2* relaxations, and ROI areas were determined for the manual and automatic segmentations, then compared.The models developed using ROIs provided by two radiologists achieved high Dice scores of 0.860 and 0.833, while the radiologists' manual segmentations achieved a Dice score of 0.820. Linear correlation coefficients for the T1, T1$ρ$, and T2* relaxations calculated using the automatic and manual segmentations ranged between 0.90 and 0.97, and there were no associated differences between the estimated average meniscal relaxation parameters. The deep learning models achieved segmentation performance equivalent to the inter-observer variability of two radiologists. The proposed deep learning-based approach can be used to efficiently generate automatic segmentations and determine meniscal relaxations times. The method has the potential to help radiologists with the assessment of meniscal diseases, such as OA.
CLMay 14, 2018
Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning ApproachJingjing Xu, Xu Sun, Qi Zeng et al.
The goal of sentiment-to-sentiment "translation" is to change the underlying sentiment of a sentence while keeping its content. The main challenge is the lack of parallel data. To solve this problem, we propose a cycled reinforcement learning method that enables training on unpaired data by collaboration between a neutralization module and an emotionalization module. We evaluate our approach on two review datasets, Yelp and Amazon. Experimental results show that our approach significantly outperforms the state-of-the-art systems. Especially, the proposed method substantially improves the content preservation performance. The BLEU score is improved from 1.64 to 22.46 and from 0.56 to 14.06 on the two datasets, respectively.
LGApr 17, 2018
A Multi-task Selected Learning Approach for Solving 3D Flexible Bin Packing ProblemLu Duan, Haoyuan Hu, Yu Qian et al.
A 3D flexible bin packing problem (3D-FBPP) arises from the process of warehouse packing in e-commerce. An online customer's order usually contains several items and needs to be packed as a whole before shipping. In particular, 5% of tens of millions of packages are using plastic wrapping as outer packaging every day, which brings pressure on the plastic surface minimization to save traditional logistics costs. Because of the huge practical significance, we focus on the issue of packing cuboid-shaped items orthogonally into a least-surface-area bin. The existing heuristic methods for classic 3D bin packing don't work well for this particular NP-hard problem and designing a good problem-specific heuristic is non-trivial. In this paper, rather than designing heuristics, we propose a novel multi-task framework based on Selected Learning to learn a heuristic-like policy that generates the sequence and orientations of items to be packed simultaneously. Through comprehensive experiments on a large scale real-world transaction order dataset and online AB tests, we show: 1) our selected learning method trades off the imbalance and correlation among the tasks and significantly outperforms the single task Pointer Network and the multi-task network without selected learning; 2) our method obtains an average 5.47% cost reduction than the well-designed greedy algorithm which is previously used in our online production system.
AISep 4, 2017
Interactive Attention Networks for Aspect-Level Sentiment ClassificationDehong Ma, Sujian Li, Xiaodong Zhang et al.
Aspect-level sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling their contexts via generating target-specific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.
AIAug 20, 2017
Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning MethodHaoyuan Hu, Xiaodong Zhang, Xiaowei Yan et al.
In this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.
CLMar 31, 2016
System Combination for Short Utterance Speaker RecognitionLantian Li, Dong Wang, Xiaodong Zhang et al.
For text-independent short-utterance speaker recognition (SUSR), the performance often degrades dramatically. This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system. The former employs phone posteriors to construct an i-vector model in which the shared statistics offers stronger robustness against limited test data, while the latter establishes a phone-dependent GMM-UBM system which represents speaker characteristics with more details. A score-level fusion is implemented to integrate the respective advantages from the two systems. Experimental results show that for the text-independent SUSR task, both the DNN-based i-vector system and the subregion-based GMM-UBM system outperform their respective baselines, and the score-level system combination delivers performance improvement.
CLMar 9, 2016
Implicit Discourse Relation Classification via Multi-Task Neural NetworksYang Liu, Sujian Li, Xiaodong Zhang et al.
Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.
ROJan 25, 2014
Kinematics analysis and three-dimensional simulation of the rehabilitation lower extremity exoskeleton robotJiangcheng Chen, Xiaodong Zhang, Lei Zhu
The kinematics recursive equation was built by using the modified D-H method after the structure of rehabilitation lower extremity exoskeleton was analyzed. The numerical algorithm of inverse kinematics was given too. Then the three-dimensional simulation model of the exoskeleton robot was built using MATLAB software, based on the model, 3D reappearance of a complete gait was achieved. Finally, the reliability of numerical algorithm of inverse kinematics was verified by the simulation result. All jobs above lay a foundation for developing a three-dimensional simulation platform of exoskeleton robot.