CVDec 19, 2022
ColoristaNet for Photorealistic Video Style TransferXiaowen Qiu, Ruize Xu, Boan He et al.
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
LGJan 31, 2025Code
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language ModelsJialin Zhao, Yingtao Zhang, Carlo Vittorio Cannistraci
The rapid growth of Large Language Models has driven demand for effective model compression techniques to reduce memory and computation costs. Low-rank pruning has gained attention for its GPU compatibility across all densities. However, low-rank pruning struggles to match the performance of semi-structured pruning, often doubling perplexity at similar densities. In this paper, we propose Pivoting Factorization (PIFA), a novel lossless meta low-rank representation that unsupervisedly learns a compact form of any low-rank representation, effectively eliminating redundant information. PIFA identifies pivot rows (linearly independent rows) and expresses non-pivot rows as linear combinations, achieving 24.2% additional memory savings and 24.6% faster inference over low-rank layers at rank = 50% of dimension. To mitigate the performance degradation caused by low-rank pruning, we introduce a novel, retraining-free reconstruction method that minimizes error accumulation (M). MPIFA, combining M and PIFA into an end-to-end framework, significantly outperforms existing low-rank pruning methods, and achieves performance comparable to semi-structured pruning, while surpassing it in GPU efficiency and compatibility. Our code is available at https://github.com/biomedical-cybernetics/pivoting-factorization.
CLJun 3, 2024Code
DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMsHaokun Lin, Haobo Xu, Yichen Wu et al.
Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address Normal Outliers, which are activations across all tokens with relatively large magnitudes. However, these methods struggle with smoothing Massive Outliers that display significantly larger values, which leads to significant performance degradation in low-bit quantization. In this paper, we introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers. First, DuQuant starts by constructing the rotation matrix, using specific outlier dimensions as prior knowledge, to redistribute outliers to adjacent channels by block-wise rotation. Second, We further employ a zigzag permutation to balance the distribution of outliers across blocks, thereby reducing block-wise variance. A subsequent rotation further smooths the activation landscape, enhancing model performance. DuQuant simplifies the quantization process and excels in managing outliers, outperforming the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant.
LGJan 31, 2025
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connectedYingtao Zhang, Diego Cerretti, Jialin Zhao et al.
Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties in keeping peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (less than 1% connectivity) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is $O(Nd^3)$ - N node network size, d node degree - restricting it to ultra-sparse regimes. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. Here, we design the first brain-inspired network model - termed bipartite receptive field (BRF) - to initialize the connectivity of sparse artificial neural networks. We further introduce a GPU-friendly matrix-based approximation of CH link prediction, reducing complexity to $O(N^3)$. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on image classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, at 30% connectivity, both CHTs and CHTss outperform other DST methods in language modeling and even exceed fully connected baselines in zero-shot tasks.
LGMay 24, 2024
Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural NetworksJialin Zhao, Yingtao Zhang, Xinghang Li et al.
The growing demands on GPU memory posed by the increasing number of neural network parameters call for training approaches that are more memory-efficient. Previous memory reduction training techniques, such as Low-Rank Adaptation (LoRA) and ReLoRA, face challenges, with LoRA being constrained by its low-rank structure, particularly during intensive tasks like pre-training, and ReLoRA suffering from saddle point issues. In this paper, we propose Sparse Spectral Training (SST) to optimize memory usage for pre-training. SST updates all singular values and selectively updates singular vectors through a multinomial sampling method weighted by the magnitude of the singular values. Furthermore, SST employs singular value decomposition to initialize and periodically reinitialize low-rank parameters, reducing distortion relative to full-rank training compared to other low-rank methods. Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, SST demonstrates its ability to outperform existing memory reduction training methods and is comparable to full-rank training in various cases. On LLaMA-1.3B, with only 18.7\% of the parameters trainable compared to full-rank training (using a rank equivalent to 6\% of the embedding dimension), SST reduces the perplexity gap between other low-rank methods and full-rank training by 97.4\%. This result highlights SST as an effective parameter-efficient technique for model pre-training.
CLAug 9, 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 EvaluationMinghan Wang, Yuxia Wang, Chang Su et al.
This paper describes our work in participation of the IWSLT-2021 offline speech translation task. Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module. We directly use the LIUM SpkDiarization tool as the diarization module. The ASR module is trained with three ASR datasets from different sources, by multi-source training, using a modified Transformer encoder. The MT module is pretrained on the large-scale WMT news translation dataset and fine-tuned on the TED corpus. Our method achieves 24.6 BLEU score on the 2021 test set.
CVOct 23, 2019
Breast Anatomy Enriched Tumor Saliency EstimationFei Xu, Yingtao Zhang, Min Xian et al.
Breast cancer investigation is of great significance, and developing tumor detection methodologies is a critical need. However, it is a challenging task for breast ultrasound due to the complicated breast structure and poor quality of the images. In this paper, we propose a novel tumor saliency estimation model guided by enriched breast anatomy knowledge to localize the tumor. Firstly, the breast anatomy layers are generated by a deep neural network. Then we refine the layers by integrating a non-semantic breast anatomy model to solve the problems of incomplete mammary layers. Meanwhile, a new background map generation method weighted by the semantic probability and spatial distance is proposed to improve the performance. The experiment demonstrates that the proposed method with the new background map outperforms four state-of-the-art TSE models with increasing 10% of F_meansure on the BUS public dataset.
CVSep 18, 2019
CrackGAN: Pavement Crack Detection Using Partially Accurate Ground Truths Based on Generative Adversarial LearningKaige Zhang, Yingtao Zhang, Heng-Da Cheng
Fully convolutional network is a powerful tool for per-pixel semantic segmentation/detection. However, it is problematic when coping with crack detection using partially accurate ground truths (GTs): the network may easily converge to the status that treats all the pixels as background (BG) and still achieves a very good loss, named "All Black" phenomenon, due to the unavailability of accurate GTs and the data imbalance. To tackle this problem, we propose crack-patch-only (CPO) supervised generative adversarial learning for end-to-end training, which forces the network to always produce crack-GT images while reserves both crack and BG-image translation abilities by feeding a larger-size crack image into an asymmetric U-shape generator to overcome the "All Black" issue. The proposed approach is validated using four crack datasets; and achieves state-of-the-art performance comparing with that of the recently published works in efficiency and accuracy.
CVSep 14, 2019
Fuzzy Semantic Segmentation of Breast Ultrasound Image with Breast Anatomy ConstraintsKuan Huang, Yingtao Zhang, H. D. Cheng et al.
Breast cancer is one of the most serious disease affecting women's health. Due to low cost, portable, no radiation, and high efficiency, breast ultrasound (BUS) imaging is the most popular approach for diagnosing early breast cancer. However, ultrasound images are low resolution and poor quality. Thus, developing accurate detection system is a challenging task. In this paper, we propose a fully automatic segmentation algorithm consisting of two parts: fuzzy fully convolutional network and accurately fine-tuning post-processing based on breast anatomy constraints. In the first part, the image is preprocessed by contrast enhancement, and wavelet features are employed for image augmentation. A fuzzy membership function transforms the augmented BUS images into fuzzy domain. The features from convolutional layers are processed using fuzzy logic as well. The conditional random fields (CRFs) post-process the segmentation result. The location relation among the breast anatomy layers is utilized to improve the performance. The proposed method is applied to the dataset with 325 BUS images, and achieves state-of-art performance compared with that of existing methods with true positive rate 90.33%, false positive rate 9.00%, and intersection over union (IoU) 81.29% on tumor category, and overall intersection over union (mIoU) 80.47% over five categories: fat layer, mammary layer, muscle layer, background, and tumor.
CVJun 18, 2019
Tumor Saliency Estimation for Breast Ultrasound Images via Breast Anatomy ModelingFei Xu, Yingtao Zhang, Min Xian et al.
Tumor saliency estimation aims to localize tumors by modeling the visual stimuli in medical images. However, it is a challenging task for breast ultrasound due to the complicated anatomic structure of the breast and poor image quality; and existing saliency estimation approaches only model generic visual stimuli, e.g., local and global contrast, location, and feature correlation, and achieve poor performance for tumor saliency estimation. In this paper, we propose a novel optimization model to estimate tumor saliency by utilizing breast anatomy. First, we model breast anatomy and decompose breast ultrasound image into layers using Neutro-Connectedness; then utilize the layers to generate the foreground and background maps; and finally propose a novel objective function to estimate the tumor saliency by integrating the foreground map, background map, adaptive center bias, and region-based correlation cues. The extensive experiments demonstrate that the proposed approach obtains more accurate foreground and background maps with the assistance of the breast anatomy; especially, for the images having large or small tumors; meanwhile, the new objective function can handle the images without tumors. The newly proposed method achieves state-of-the-art performance when compared to eight tumor saliency estimation approaches using two breast ultrasound datasets.
CVJun 27, 2018
A Hybrid Framework for Tumor Saliency EstimationFei Xu, Min Xian, Yingtao Zhang et al.
Automatic tumor segmentation of breast ultrasound (BUS) image is quite challenging due to the complicated anatomic structure of breast and poor image quality. Most tumor segmentation approaches achieve good performance on BUS images collected in controlled settings; however, the performance degrades greatly with BUS images from different sources. Tumor saliency estimation (TSE) has attracted increasing attention to solving the problem by modeling radiologists' attention mechanism. In this paper, we propose a novel hybrid framework for TSE, which integrates both high-level domain-knowledge and robust low-level saliency assumptions and can overcome drawbacks caused by direct mapping in traditional TSE approaches. The new framework integrated the Neutro-Connectedness (NC) map, the adaptive-center, the correlation and the layer structure-based weighted map. The experimental results demonstrate that the proposed approach outperforms state-of-the-art TSE methods.
CVJun 9, 2018
Abstaining Classification When Error Costs are Unequal and UnknownHongjiao Guan, Yingtao Zhang, H. D. Cheng et al.
Abstaining classificaiton aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the cost-sensitive applications. In such applications, different types of errors (false positive or false negative) usaully have unequal costs. And the error costs, which depend on specific applications, are usually unknown. However, current abstaining classification methods either do not distinguish the error types, or they need the cost information of misclassification and rejection, which are realized in the framework of cost-sensitive learning. In this paper, we propose a bounded-abstention method with two constraints of reject rates (BA2), which performs abstaining classification when error costs are unequal and unknown. BA2 aims to obtain the optimal area under the ROC curve (AUC) by constraining the reject rates of the positive and negative classes respectively. Specifically, we construct the receiver operating characteristic (ROC) curve, and stepwise search the optimal reject thresholds from both ends of the curve, untill the two constraints are satisfied. Experimental results show that BA2 obtains higher AUC and lower total cost than the state-of-the-art abstaining classification methods. Meanwhile, BA2 achieves controllable reject rates of the positive and negative classes.
CVFeb 13, 2018
Computer-Aided Knee Joint Magnetic Resonance Image Segmentation - A SurveyBoyu Zhang, Yingtao Zhang, H. D. Cheng et al.
Osteoarthritis (OA) is one of the major health issues among the elderly population. MRI is the most popular technology to observe and evaluate the progress of OA course. However, the extreme labor cost of MRI analysis makes the process inefficient and expensive. Also, due to human error and subjective nature, the inter- and intra-observer variability is rather high. Computer-aided knee MRI segmentation is currently an active research field because it can alleviate doctors and radiologists from the time consuming and tedious job, and improve the diagnosis performance which has immense potential for both clinic and scientific research. In the past decades, researchers have investigated automatic/semi-automatic knee MRI segmentation methods extensively. However, to the best of our knowledge, there is no comprehensive survey paper in this field yet. In this survey paper, we classify the existing methods by their principles and discuss the current research status and point out the future research trend in-depth.
CVJan 9, 2018
BUSIS: A Benchmark for Breast Ultrasound Image SegmentationMin Xian, Yingtao Zhang, H. D. Cheng et al.
Breast ultrasound (BUS) image segmentation is challenging and critical for BUS Comput-er-Aided Diagnosis (CAD) systems. Many BUS segmentation approaches have been studied in the last two decades, but the performances of most approaches have been assessed using relatively small private datasets with different quantitative metrics, which results in a discrepancy in performance comparison. Therefore, there is a pressing need for building a benchmark to compare existing methods using a public dataset objectively, to determine the performance of the best breast tumor segmentation algorithm available today, and to investigate what segmentation strategies are valuable in clinical practice and theoretical study. In this work, a benchmark for B-mode breast ultrasound image segmentation is presented. In the benchmark, 1) we collected 562 breast ultrasound images, prepared a software tool, and involved four radiologists in obtaining accurate annotations through standardized procedures; 2) we extensively compared the performance of sixteen state-of-the-art segmentation methods and discussed their advantages and disadvantages; 3) we proposed a set of valuable quantitative metrics to evaluate both semi-automatic and fully automatic segmentation approaches; and 4) the successful segmentation strategies and possible future improvements are discussed in details.
CVApr 4, 2017
Automatic Breast Ultrasound Image Segmentation: A SurveyMin Xian, Yingtao Zhang, H. D. Cheng et al.
Breast cancer is one of the leading causes of cancer death among women worldwide. In clinical routine, automatic breast ultrasound (BUS) image segmentation is very challenging and essential for cancer diagnosis and treatment planning. Many BUS segmentation approaches have been studied in the last two decades, and have been proved to be effective on private datasets. Currently, the advancement of BUS image segmentation seems to meet its bottleneck. The improvement of the performance is increasingly challenging, and only few new approaches were published in the last several years. It is the time to look at the field by reviewing previous approaches comprehensively and to investigate the future directions. In this paper, we study the basic ideas, theories, pros and cons of the approaches, group them into categories, and extensively review each category in depth by discussing the principles, application issues, and advantages/disadvantages.
CVDec 19, 2015
Neutro-Connectedness CutMin Xian, Yingtao Zhang, H. D. Cheng et al.
Interactive image segmentation is a challenging task and receives increasing attention recently; however, two major drawbacks exist in interactive segmentation approaches. First, the segmentation performance of ROI-based methods is sensitive to the initial ROI: different ROIs may produce results with great difference. Second, most seed-based methods need intense interactions, and are not applicable in many cases. In this work, we generalize the Neutro-Connectedness (NC) to be independent of top-down priors of objects and to model image topology with indeterminacy measurement on image regions, propose a novel method for determining object and background regions, which is applied to exclude isolated background regions and enforce label consistency, and put forward a hybrid interactive segmentation method, Neutro-Connectedness Cut (NC-Cut), which can overcome the above two problems by utilizing both pixel-wise appearance information and region-based NC properties. We evaluate the proposed NC-Cut by employing two image datasets (265 images), and demonstrate that the proposed approach outperforms state-of-the-art interactive image segmentation methods (Grabcut, MILCut, One-Cut, MGC_max^sum and pPBC).
CVAug 24, 2015
An algorithm for Left Atrial Thrombi detection using Transesophageal EchocardiographyJianrui Ding, Min Xian, H. D. Cheng et al.
Transesophageal echocardiography (TEE) is widely used to detect left atrium (LA)/left atrial appendage (LAA) thrombi. In this paper, the local binary pattern variance (LBPV) features are extracted from region of interest (ROI). And the dynamic features are formed by using the information of its neighbor frames in the sequence. The sequence is viewed as a bag, and the images in the sequence are considered as the instances. Multiple-instance learning (MIL) method is employed to solve the LAA thrombi detection. The experimental results show that the proposed method can achieve better performance than that by using other methods.