Sen Liu

CV
h-index18
30papers
728citations
Novelty46%
AI Score50

30 Papers

CVAug 21, 2022Code
HST: Hierarchical Swin Transformer for Compressed Image Super-resolution

Bingchen Li, Xin Li, Yiting Lu et al.

Compressed Image Super-resolution has achieved great attention in recent years, where images are degraded with compression artifacts and low-resolution artifacts. Since the complex hybrid distortions, it is hard to restore the distorted image with the simple cooperation of super-resolution and compression artifacts removing. In this paper, we take a step forward to propose the Hierarchical Swin Transformer (HST) network to restore the low-resolution compressed image, which jointly captures the hierarchical feature representations and enhances each-scale representation with Swin transformer, respectively. Moreover, we find that the pretraining with Super-resolution (SR) task is vital in compressed image super-resolution. To explore the effects of different SR pretraining, we take the commonly-used SR tasks (e.g., bicubic and different real super-resolution simulations) as our pretraining tasks, and reveal that SR plays an irreplaceable role in the compressed image super-resolution. With the cooperation of HST and pre-training, our HST achieves the fifth place in AIM 2022 challenge on the low-quality compressed image super-resolution track, with the PSNR of 23.51dB. Extensive experiments and ablation studies have validated the effectiveness of our proposed methods. The code and models are available at https://github.com/USTC-IMCL/HST-for-Compressed-Image-SR.

CLNov 3, 2023Code
$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

Yuhang Zhou, Yu He, Siyu Tian et al.

While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we introduce a novel approach, $R^3$-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.

MMJul 13, 2022
RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Yiting Lu, Jun Fu, Xin Li et al.

Coronary CT Angiography (CCTA) is susceptible to various distortions (e.g., artifacts and noise), which severely compromise the exact diagnosis of cardiovascular diseases. The appropriate CCTA Vessel-level Image Quality Assessment (CCTA VIQA) algorithm can be used to reduce the risk of error diagnosis. The primary challenges of CCTA VIQA are that the local part of coronary that determines final quality is hard to locate. To tackle the challenge, we formulate CCTA VIQA as a multiple-instance learning (MIL) problem, and exploit Transformer-based MIL backbone (termed as T-MIL) to aggregate the multiple instances along the coronary centerline into the final quality. However, not all instances are informative for final quality. There are some quality-irrelevant/negative instances intervening the exact quality assessment(e.g., instances covering only background or the coronary in instances is not identifiable). Therefore, we propose a Progressive Reinforcement learning based Instance Discarding module (termed as PRID) to progressively remove quality-irrelevant/negative instances for CCTA VIQA. Based on the above two modules, we propose a Reinforced Transformer Network (RTN) for automatic CCTA VIQA based on end-to-end optimization. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the real-world CCTA dataset, exceeding previous MIL methods by a large margin.

30.5LGApr 19Code
TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering

Keyang Chen, Mingxuan Jiang, Yongsheng Zhao et al.

Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-level semantics beyond anonymized identifiers, and (ii) they rely on template-driven anomaly injection, which biases benchmarks toward static structural motifs and yields overly optimistic assessments of model robustness. We propose TransXion, a benchmark ecosystem for Anti-Money Laundering (AML) research that integrates profile-aware simulation of normal activity with stochastic, non-template synthesis of illicit subgraphs.TransXion jointly models persistent entity profiles and conditional transaction behavior, enabling evaluation of "out-of-character" anomalies where observed activity contradicts an entity's socio-economic context. The resulting dataset comprises approximately 3 million transactions among 50,000 entities, each endowed with rich demographic and behavioral attributes. Empirical analyses show that TransXion reproduces key structural properties of payment networks, including heavy-tailed activity distributions and localized subgraph structure. Across a diverse array of detection models spanning multiple algorithmic paradigms, TransXion yields substantially lower detection performance than widely used benchmarks, demonstrating increased difficulty and realism. TransXion provides a more faithful testbed for developing context-aware and robust AML detection methods. The dataset and code are publicly available at https://github.com/chaos-max/TransXion.

SDJun 25, 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Sen Liu, Yiwei Guo, Chenpeng Du et al.

Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i.e. speaker similarity) and eliminate the accents from their first language(i.e. nativeness). In this paper, we demonstrated that vector-quantized(VQ) acoustic feature contains less speaker information than mel-spectrogram. Based on this finding, we propose a novel dual speaker embedding TTS (DSE-TTS) framework for CTTS with authentic speaking style. Here, one embedding is fed to the acoustic model to learn the linguistic speaking style, while the other one is integrated into the vocoder to mimic the target speaker's timbre. Experiments show that by combining both embeddings, DSE-TTS significantly outperforms the state-of-the-art SANE-TTS in cross-lingual synthesis, especially in terms of nativeness.

DCJun 29, 2023
OSP: Boosting Distributed Model Training with 2-stage Synchronization

Zixuan Chen, Lei Shi, Xuandong Liu et al.

Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase, the network connection between nodes is becoming a major bottleneck. Various methods of gradient compression and improved model synchronization have been proposed to address this bottleneck in Parameter-Server-based DDL. However, these two types of methods can result in accuracy loss due to discarded gradients and have limited enhancement on the throughput of model synchronization, respectively. To address these challenges, we propose a new model synchronization method named Overlapped Synchronization Parallel (OSP), which achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy loss caused by stale parameters. The prototype of OSP has been implemented using PyTorch and evaluated on commonly used deep learning models and datasets with a 9-node testbed. Evaluation results show that OSP can achieve up to 50\% improvement in throughput without accuracy loss compared to popular synchronization models.

ASNov 2, 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

Hanglei Zhang, Yiwei Guo, Sen Liu et al.

Expressive text-to-speech (TTS) aims to synthesize speeches with human-like tones, moods, or even artistic attributes. Recent advancements in expressive TTS empower users with the ability to directly control synthesis style through natural language prompts. However, these methods often require excessive training with a significant amount of style-annotated data, which can be challenging to acquire. Moreover, they may have limited adaptability due to fixed style annotations. In this work, we present FreeStyleTTS (FS-TTS), a controllable expressive TTS model with minimal human annotations. Our approach utilizes a large language model (LLM) to transform expressive TTS into a style retrieval task. The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions. The selected reference guides the TTS pipeline to synthesize speeches with the intended style. This innovative approach provides flexible, versatile, and precise style control with minimal human workload. Experiments on a Mandarin storytelling corpus demonstrate FS-TTS's proficiency in leveraging LLM's semantic inference ability to retrieve desired styles from either input text or user-defined descriptions. This results in synthetic speeches that are closely aligned with the specified styles.

NIJul 29, 2024
Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training

Zixuan Chen, Xuandong Liu, Minglin Li et al.

Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with PS to mitigate its incast issue. However, such PS-based INA has poor incremental deployment abilities as it requires replacing all the switches to show significant performance improvement, which is not cost-effective. In this study, we present the incorporation of INA capabilities into RAR, called RAR with In-Network Aggregation (Rina), to tackle both the problems above. Rina features its agent-worker mechanism. When an INA-capable ToR switch is deployed, all workers in this rack run as one abstracted worker with the help of the agent, resulting in both excellent incremental deployment capabilities and better throughput. We conducted extensive testbed and simulation evaluations to substantiate the throughput advantages of Rina over existing DDL training synchronization structures. Compared with the state-of-the-art PS-based INA methods ATP, Rina can achieve more than 50\% throughput with the same hardware cost.

41.0NIMay 16
Escape from Callback Hell! A New Programming Paradigm for Network Simulation

Yuanyi Zhu, Zijian Li, Xin Ai et al.

Network simulation plays a crucial role in both networking research and industry. Existing commonly-used Discrete Event Simulations (DES) are based on callback mechanisms for discrete event (DE). However, due to the inability of callbacks to naturally simulate network events, programs in network simulation cannot be written in a sequential workflow. This leads to inherent complexity and poor maintainability, resulting in stack ripping and callback hell. These problems significantly increase simulation development workloads and introduce substantial cognitive loads associated with programming and debugging. To enable more efficient development of network simulation and facilitate the rapid evaluation and evolution of network functions, we propose a novel development paradigm for network simulation named ``CoDES" (\textbf{Co}routine-based \textbf{DES}). To the best of our knowledge, we are the first to focus on optimizing the network simulation development process rather than performance based on the coroutine mechanism. We implement a new network simulation framework based on CoDES that is capable of naturally simulating network events and effectively address key system challenges related to correctness, functionality, compatibility, and overhead. It enables developers to create sequential workflows for network programs and simplifies the code structure, thus reducing development workloads while enhancing code readability and maintainability. We apply this paradigm to a commonly used network simulator, NS-3 to implement Message Passing Interface (MPI), High Precision Congestion Control (HPCC), and Routing Information Protocol (RIP), achieving up to 62.3\% and 82.6\% reduction in code volume and structure complexity without sacrificing simulation accuracy, extending execution time or increasing runtime memory of simulation.

CVNov 23, 2019Code
Region Normalization for Image Inpainting

Tao Yu, Zongyu Guo, Xin Jin et al.

Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements. Our code is available at https://github.com/geekyutao/RN.

CLFeb 20, 2024
Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs

Yuhang Zhou, Yuchen Ni, Yunhui Gan et al.

Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a framework with components like Bias Unveiler, Bias Detective, Bias Tracker, and Bias Antidote to identify, detect, analyze, and eliminate irrational biases in LLMs. By combining behavioral finance principles with bias examination, we evaluate 23 leading LLMs and propose a de-biasing method based on financial causal knowledge. Results show varying degrees of financial irrationality among models, influenced by their design and training. Models trained specifically on financial datasets may exhibit more irrationality, and even larger financial language models (FinLLMs) can show more bias than smaller, general models. We utilize four prompt-based methods incorporating causal debiasing, effectively reducing financial biases in these models. This work enhances the understanding of LLMs' bias in financial applications, laying the foundation for developing more reliable and rational financial analysis tools.

LGMay 20, 2024
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

Liuzhi Zhou, Yu He, Kun Zhai et al.

Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.

SDApr 23, 2024
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Sen Liu, Yiwei Guo, Xie Chen et al.

While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works. In this paper, we introduce StoryTTS, a highly ETTS dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show. A systematic and comprehensive labeling framework is proposed for textual expressiveness. We analyze and define speech-related textual expressiveness in StoryTTS to include five distinct dimensions through linguistics, rhetoric, etc. Then we employ large language models and prompt them with a few manual annotation examples for batch annotation. The resulting corpus contains 61 hours of consecutive and highly prosodic speech equipped with accurate text transcriptions and rich textual expressiveness annotations. Therefore, StoryTTS can aid future ETTS research to fully mine the abundant intrinsic textual and acoustic features. Experiments are conducted to validate that TTS models can generate speech with improved expressiveness when integrating with the annotated textual labels in StoryTTS.

CLApr 7, 2024
SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning

Yuhang Zhou, Zeping Li, Siyu Tian et al.

Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this challenge, our study introduces an Adaptive Semantic Space Learning (ASSL) framework, which utilizes the adaptive reorganization of data distributions within the semantic space to enhance the performance and selection efficacy of multi-expert models. Utilizing this framework, we trained a financial multi-task LLM named "SilverSight". Our research findings demonstrate that our framework can achieve results close to those obtained with full data training using only 10% of the data, while also exhibiting strong generalization capabilities.

LGSep 12, 2025
Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes

Mingxuan Jiang, Yongxin Wang, Ziyue Dai et al.

Synthetic tabular data generation is increasingly essential in data management, supporting downstream applications when real-world and high-quality tabular data is insufficient. Existing tabular generation approaches, such as generative adversarial networks (GANs), diffusion models, and fine-tuned Large Language Models (LLMs), typically require sufficient reference data, limiting their effectiveness in domain-specific databases with scarce records. While prompt-based LLMs offer flexibility without parameter tuning, they often fail to capture dataset-specific feature-label dependencies and generate redundant data, leading to degradation in downstream task performance. To overcome these issues, we propose ReFine, a framework that (i) derives symbolic "if-then" rules from interpretable models and embeds them into prompts to explicitly guide generation toward domain-specific feature distribution, and (ii) applies a dual-granularity filtering strategy that suppresses over-sampling patterns and selectively refines rare but informative samples to reduce distributional imbalance. Extensive experiments on various regression and classification benchmarks demonstrate that ReFine consistently outperforms state-of-the-art methods, achieving up to 0.44 absolute improvement in R-squared for regression and 10.0 percent relative improvement in F1 score for classification tasks.

ROJan 29, 2025
Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control

Matsive Ali, Sandesh Giri, Sen Liu et al.

With the rapid development of deep reinforcement learning technology, it gradually demonstrates excellent potential and is becoming the most promising solution in the robotics. However, in the smart manufacturing domain, there is still not too much research involved in dynamic adaptive control mechanisms optimizing complex processes. This research advances the integration of Soft Actor-Critic (SAC) with digital twins for industrial robotics applications, providing a framework for enhanced adaptive real-time control for smart additive manufacturing processing. The system architecture combines Unity's simulation environment with ROS2 for seamless digital twin synchronization, while leveraging transfer learning to efficiently adapt trained models across tasks. We demonstrate our methodology using a Viper X300s robot arm with the proposed hierarchical reward structure to address the common reinforcement learning challenges in two distinct control scenarios. The results show rapid policy convergence and robust task execution in both simulated and physical environments demonstrating the effectiveness of our approach.

DCMay 7, 2023
Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol

Zixuan Chen, Lei Shi, Xuandong Liu et al.

Distributed Machine Learning (DML) systems are utilized to enhance the speed of model training in data centers (DCs) and edge nodes. The Parameter Server (PS) communication architecture is commonly employed, but it faces severe long-tail latency caused by many-to-one "incast" traffic patterns, negatively impacting training throughput. To address this challenge, we design the \textbf{L}oss-tolerant \textbf{T}ransmission \textbf{P}rotocol (LTP), which permits partial loss of gradients during synchronization to avoid unneeded retransmission and contributes to faster synchronization per iteration. LTP implements loss-tolerant transmission through \textit{out-of-order transmission} and \textit{out-of-order Acknowledges (ACKs)}. LTP employs \textit{Early Close} to adjust the loss-tolerant threshold based on network conditions and \textit{Bubble Filling} for data correction to maintain training accuracy. LTP is implemented by C++ and integrated into PyTorch. Evaluations on a testbed of 8 worker nodes and one PS node demonstrate that LTP can significantly improve DML training task throughput by up to 30x compared to traditional TCP congestion controls, with no sacrifice to final accuracy.

MTRL-SCIMar 22, 2021
Comprehensive process-molten pool relations modeling using CNN for wire-feed laser additive manufacturing

Noopur Jamnikar, Sen Liu, Craig Brice et al.

Wire-feed laser additive manufacturing (WLAM) is gaining wide interest due to its high level of automation, high deposition rates, and good quality of printed parts. In-process monitoring and feedback controls that would reduce the uncertainty in the quality of the material are in the early stages of development. Machine learning promises the ability to accelerate the adoption of new processes and property design in additive manufacturing by making process-structure-property connections between process setting inputs and material quality outcomes. The molten pool dimensional information and temperature are the indicators for achieving the high quality of the build, which can be directly controlled by processing parameters. For the purpose of in situ quality control, the process parameters should be controlled in real-time based on sensed information from the process, in particular the molten pool. Thus, the molten pool-process relations are of preliminary importance. This paper analyzes experimentally collected in situ sensing data from the molten pool under a set of controlled process parameters in a WLAM system. The variations in the steady-state and transient state of the molten pool are presented with respect to the change of independent process parameters. A multi-modality convolutional neural network (CNN) architecture is proposed for predicting the control parameter directly from the measurable molten pool sensor data for achieving desired geometric and microstructural properties. Dropout and regularization are applied to the CNN architecture to avoid the problem of overfitting. The results highlighted that the multi-modal CNN, which receives temperature profile as an external feature to the features extracted from the image data, has improved prediction performance compared to the image-based uni-modality CNN approach.

MTRL-SCIMar 21, 2021
Machine learning based in situ quality estimation by molten pool condition-quality relations modeling using experimental data

Noopur Jamnikar, Sen Liu, Craig Brice et al.

The advancement of machine learning promises the ability to accelerate the adoption of new processes and property designs for metal additive manufacturing. The molten pool geometry and molten pool temperature are the significant indicators for the final part's geometric shape and microstructural properties for the Wire-feed laser direct energy deposition process. Thus, the molten pool condition-property relations are of preliminary importance for in situ quality assurance. To enable in situ quality monitoring of bead geometry and characterization properties, we need to continuously monitor the sensor's data for molten pool dimensions and temperature for the Wire-feed laser additive manufacturing (WLAM) system. We first develop a machine learning convolutional neural network (CNN) model for establishing the correlations from the measurable molten pool image and temperature data directly to the geometric shape and microstructural properties. The multi-modality network receives both the camera image and temperature measurement as inputs, yielding the corresponding characterization properties of the final build part (e.g., fusion zone depth, alpha lath thickness). The performance of the CNN model is compared with the regression model as a baseline. The developed models enable molten pool condition-quality relations mapping for building quantitative and collaborative in situ quality estimation and assurance framework.

LGJan 13, 2021
A Physics-Informed Machine Learning Model for Porosity Analysis in Laser Powder Bed Fusion Additive Manufacturing

Rui Liu, Sen Liu, Xiaoli Zhang

To control part quality, it is critical to analyze pore generation mechanisms, laying theoretical foundation for future porosity control. Current porosity analysis models use machine setting parameters, such as laser angle and part pose. However, these setting-based models are machine dependent, hence they often do not transfer to analysis of porosity for a different machine. To address the first problem, a physics-informed, data-driven model (PIM), which instead of directly using machine setting parameters to predict porosity levels of printed parts, it first interprets machine settings into physical effects, such as laser energy density and laser radiation pressure. Then, these physical, machine independent effects are used to predict porosity levels according to pass, flag, fail categories instead of focusing on quantitative pore size prediction. With six learning methods evaluation, PIM proved to achieve good performances with prediction error of 10$\sim$26%. Finally, pore-encouraging influence and pore-suppressing influence were analyzed for quality analysis.

IVSep 30, 2020
FAN: Frequency Aggregation Network for Real Image Super-resolution

Yingxue Pang, Xin Li, Xin Jin et al.

Single image super-resolution (SISR) aims to recover the high-resolution (HR) image from its low-resolution (LR) input image. With the development of deep learning, SISR has achieved great progress. However, It is still a challenge to restore the real-world LR image with complicated authentic degradations. Therefore, we propose FAN, a frequency aggregation network, to address the real-world image super-resolu-tion problem. Specifically, we extract different frequencies of the LR image and pass them to a channel attention-grouped residual dense network (CA-GRDB) individually to output corresponding feature maps. And then aggregating these residual dense feature maps adaptively to recover the HR image with enhanced details and textures. We conduct extensive experiments quantitatively and qualitatively to verify that our FAN performs well on the real image super-resolution task of AIM 2020 challenge. According to the released final results, our team SR-IM achieves the fourth place on the X4 track with PSNR of 31.1735 and SSIM of 0.8728.

CVSep 25, 2020
AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

Pengxu Wei, Hannan Lu, Radu Timofte et al.

This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, which is much more complicated and challenging, and contributes to real-world image super-resolution applications. 452 participants were registered for three tracks in total, and 24 teams submitted their results. They gauge the state-of-the-art approaches for real image SR in terms of PSNR and SSIM.

IVAug 19, 2020
LIRA: Lifelong Image Restoration from Unknown Blended Distortions

Jianzhao Liu, Jianxin Lin, Xin Li et al.

Most existing image restoration networks are designed in a disposable way and catastrophically forget previously learned distortions when trained on a new distortion removal task. To alleviate this problem, we raise the novel lifelong image restoration problem for blended distortions. We first design a base fork-join model in which multiple pre-trained expert models specializing in individual distortion removal task work cooperatively and adaptively to handle blended distortions. When the input is degraded by a new distortion, inspired by adult neurogenesis in human memory system, we develop a neural growing strategy where the previously trained model can incorporate a new expert branch and continually accumulate new knowledge without interfering with learned knowledge. Experimental results show that the proposed approach can not only achieve state-of-the-art performance on blended distortions removal tasks in both PSNR/SSIM metrics, but also maintain old expertise while learning new restoration tasks.

CVJul 22, 2020
Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

Xin Li, Xin Jin, Jianxin Lin et al.

Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance. To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions. Specifically, we propose the feature disentanglement module (FDM) to distribute feature representations of different distortions into different channels by revising gain-control-based normalization. We also propose a feature aggregation module (FAM) with channel-wise attention to adaptively filter out the distortion representations and aggregate useful content information from different channels for the construction of raw image. The effectiveness of the proposed scheme is verified by visualizing the correlation matrix of features and channel responses of different distortions. Extensive experimental results also prove superior performance of our approach compared with the latest HD-IR schemes.

CVMar 4, 2020
VESR-Net: The Winning Solution to Youku Video Enhancement and Super-Resolution Challenge

Jiale Chen, Xu Tan, Chaowei Shan et al.

This paper introduces VESR-Net, a method for video enhancement and super-resolution (VESR). We design a separate non-local module to explore the relations among video frames and fuse video frames efficiently, and a channel attention residual block to capture the relations among feature maps for video frame reconstruction in VESR-Net. We conduct experiments to analyze the effectiveness of these designs in VESR-Net, which demonstrates the advantages of VESR-Net over previous state-of-the-art VESR methods. It is worth to mention that among more than thousands of participants for Youku video enhancement and super-resolution (Youku-VESR) challenge, our proposed VESR-Net beat other competitive methods and ranked the first place.

MTRL-SCIMar 4, 2020
Physics-informed machine learning for composition-process-property alloy design: shape memory alloy demonstration

Sen Liu, Branden B. Kappes, Behnam Amin-ahmadi et al.

Machine learning (ML) is shown to predict new alloys and their performances in a high dimensional, multiple-target-property design space that considers chemistry, multi-step processing routes, and characterization methodology variations. A physics-informed featured engineering approach is shown to enable otherwise poorly performing ML models to perform well with the same data. Specifically, previously engineered elemental features based on alloy chemistries are combined with newly engineered heat treatment process features. The new features result from first transforming the heat treatment parameter data as it was previously recorded using nonlinear mathematical relationships known to describe the thermodynamics and kinetics of phase transformations in alloys. The ability of the ML model to be used for predictive design is validated using blind predictions. Composition - process - property relationships for thermal hysteresis of shape memory alloys (SMAs) with complex microstructures created via multiple melting-homogenization-solutionization-precipitation processing stage variations are captured, in addition to the mean transformation temperatures of the SMAs. The quantitative models of hysteresis exhibited by such highly processed alloys demonstrate the ability for ML models to design for physical complexities that have challenged physics-based modeling approaches for decades.

CVJun 1, 2019
ZstGAN: An Adversarial Approach for Unsupervised Zero-Shot Image-to-Image Translation

Jianxin Lin, Yingce Xia, Sen Liu et al.

Image-to-image translation models have shown remarkable ability on transferring images among different domains. Most of existing work follows the setting that the source domain and target domain keep the same at training and inference phases, which cannot be generalized to the scenarios for translating an image from an unseen domain to another unseen domain. In this work, we propose the Unsupervised Zero-Shot Image-to-image Translation (UZSIT) problem, which aims to learn a model that can translate samples from image domains that are not observed during training. Accordingly, we propose a framework called ZstGAN: By introducing an adversarial training scheme, ZstGAN learns to model each domain with domain-specific feature distribution that is semantically consistent on vision and attribute modalities. Then the domain-invariant features are disentangled with an shared encoder for image generation. We carry out extensive experiments on CUB and FLO datasets, and the results demonstrate the effectiveness of proposed method on UZSIT task. Moreover, ZstGAN shows significant accuracy improvements over state-of-the-art zero-shot learning methods on CUB and FLO.

CVFeb 11, 2019
Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation

Jianxin Lin, Zhibo Chen, Yingce Xia et al.

Image-to-image translation tasks have been widely investigated with Generative Adversarial Networks (GANs). However, existing approaches are mostly designed in an unsupervised manner while little attention has been paid to domain information within unpaired data. In this paper, we treat domain information as explicit supervision and design an unpaired image-to-image translation framework, Domain-supervised GAN (DosGAN), which takes the first step towards the exploration of explicit domain supervision. In contrast to representing domain characteristics using different generators or domain codes, we pre-train a classification network to explicitly classify the domain of an image. After pre-training, this network is used to extract the domain-specific features of each image. Such features, together with the domain-independent features extracted by another encoder (shared across different domains), are used to generate image in target domain. Extensive experiments on multiple facial attribute translation, multiple identity translation, multiple season translation and conditional edges-to-shoes/handbags demonstrate the effectiveness of our method. In addition, we can transfer the domain-specific feature extractor obtained on the Facescrub dataset with domain supervision information to unseen domains, such as faces in the CelebA dataset. We also succeed in achieving conditional translation with any two images in CelebA, while previous models like StarGAN cannot handle this task.

CRNov 18, 2018
Distribution Discrepancy Maximization for Image Privacy Preserving

Sen Liu, Jianxin Lin, Zhibo Chen

With the rapid increase in online photo sharing activities, image obfuscation algorithms become particularly important for protecting the sensitive information in the shared photos. However, existing image obfuscation methods based on hand-crafted principles are challenged by the dramatic development of deep learning techniques. To address this problem, we propose to maximize the distribution discrepancy between the original image domain and the encrypted image domain. Accordingly, we introduce a collaborative training scheme: a discriminator $D$ is trained to discriminate the reconstructed image from the encrypted image, and an encryption model $G_e$ is required to generate these two kinds of images to maximize the recognition rate of $D$, leading to the same training objective for both $D$ and $G_e$. We theoretically prove that such a training scheme maximizes two distributions' discrepancy. Compared with commonly-used image obfuscation methods, our model can produce satisfactory defense against the attack of deep recognition models indicated by significant accuracy decreases on FaceScrub, Casia-WebFace and LFW datasets.

CVJun 24, 2017
Large-Scale Mapping of Human Activity using Geo-Tagged Videos

Yi Zhu, Sen Liu, Shawn Newsam

This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.